Brainfuck Part 1

Brainfuck is probably well-known enough that I don't need to give you a lengthy explanation. Just check the wikipedia entry to read up on this 'minimalist, esoteric programming language'. Today, I decided to build a brainfuck interpreter and write about it. Why? Because interpreters are cool, and brainfuck is so simple that we can build one rather quickly and then play around with some optimizations or even expand to a compiler. I'm working in Swift because I've recently found it to be a surprisingly pleasant language to work with.

So let's get started on a naive implementation. We'll want to define a data pointer, an instruction pointer, and allocate a block of memory and an empty array to hold our output.

var instructionPointer = 0
var dataPointer = 0
let MAX_MEM = 30_000

var memory: [Int] = Array(repeating: 0, count: MAX_MEM)

var output: [Int] = []

We also need a way to read in a program. This gives me an excuse to figure out how to take command line arguments in Swift. And because Swift is a lovely language, it turns out to be pretty simple: CommandLine.arguments.

// Default to hello world test program
var program_string = "++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++."

if CommandLine.arguments.count == 2 {
    let filename = CommandLine.arguments[1]
    program_string = try! String(contentsOfFile: filename)
} else {
    print("Please provide a single command line argument (path to bf file)")
}
let program = Array(program_string)

If we provide a path to a file as a command line argument, we read that as our program. As a fallback and useful test case, we default to a hello world program. We check for the second element of CommandLine.arguments because the first element is just the path to our executable. Now to actually interpret our program:

while instructionPointer < program.count {
    switch program[instructionPointer] {
    case "+": memory[dataPointer] += 1
    case "-": memory[dataPointer] -= 1
    case ">": dataPointer += 1
    case "<": dataPointer -= 1
    case "[":
        if memory[dataPointer] == 0 {
            var open = 1
            while open > 0 {
                instructionPointer += 1
                if program[instructionPointer] == "[" { open += 1}
                if program[instructionPointer] == "]" { open -= 1}
            }
        }
    case "]":
        if memory[dataPointer] != 0 {
            var open = 1
            while open > 0 {
                instructionPointer -= 1
                if program[instructionPointer] == "[" { open -= 1}
                if program[instructionPointer] == "]" { open += 1}
            }
        }
    case ".":
        output.append(memory[dataPointer])
    case ",":
        memory[dataPointer] = Int(getchar())
    default:
        break
    }
    if dataPointer < 0 { dataPointer = MAX_MEM - 1}
    if dataPointer == MAX_MEM { dataPointer = 0 }
    instructionPointer += 1
}

This is just a naive implementation of the spec. For +/-, we increment the current memory cell. For >/<, we move the data pointer. '[' is a 'jump if zero' instruction and '[' is a jump if not zero. In either case, we just iterate forward or backward through the program until we find the matching bracket. For ., we add to our output buffer, and for ,, we wait for user input. I've also included logic for the data pointer to wrap around if it runs out of bounds. I'm not sure if that's intended, or if going out of bounds should throw an error.

And that's it, really. We are now interpreting some brainfuck. Let's add some pretty output. If everything can be converted to ASCII, we output a string. Otherwise, just show raw numbers:

do {
    let scalars = try output.map {UnicodeScalar($0)!}
    print(scalars.map{String($0)}.joined())
} catch {
    print(output)
}

UnicodeScalar takes an integer and returns an Optional(Character), so we unwrap it, cast to a string, then join an array of strings. Swift does have some nice functional aspects that let you do fun things with map/filter/reduce.

So now we have an interpreter. It's slow and inefficient, though. There are two obvious ways to improve things. Instead of iterating through a program like '++++++', we can condense this into something like '+6'. So basically do some run length encoding to compress the program. Second, instead of iterating over the whole program to match '[' and ']', we should do that once when we read the program and then save a mapping of opening and closing brackets.

So next time, we'll add these optimizations and build a proper lexer/tokenizer.

Code for this iteration can be found on my github.