Brian Robert Callahan
academic, developer, with an eye towards a brighter techno-social life
All source code for this blog post can be found here.
Now that we have finished our bootstrap PL/0 compiler, I want to test out a shiny new toy I found that we may select as an important piece of our self-hosting compiler. That shiny new toy is QBE: a small and fast compiler back end that reads in an intermediate language, performs optimizations on it, and outputs assembly code for either amd64 or aarch64. I have heard that riscv64 support is in the works as well. But even just amd64 and aarch64 support will allow us to have a self-hosting compiler that outputs assembly for multiple CPUs. And these days, most of us reading this will be using an amd64 or aarch64 machine most of the time, so just those two platforms should be good enough for us. Seems like it's small enough that you could implement a new assembly outputter for the CPU of your choice as well. In any event, it is claimed to work on Mac OS, FreeBSD, and Linux, and it works on our OpenBSD development platform as well without any modifications, so I think QBE will be a good thing for us to try out and see if we like it.
I began thinking about this based on an email I had received about our compiler, asking if I would consider having the compiler target LLVM intermediate representation. I hadn't considered that. LLVM IR has some additional challenges that we must overcome, most notably its strict SSA form. QBE IR, while it also uses SSA form, is not as strict of a representation as LLVM IR, as we will soon see. QBE is also many orders of magnitude smaller than LLVM and written in C so we could even include a complete copy of the QBE sources with our self-hosting compiler. Even so, QBE claims 70% of the performance of compilers like LLVM in 10% of the code. Are those number true? Well, the 10% of the code certainly seems true. Whether or not we really get 70% of the performance, I am not sure. I am not sure it even matters. Even if we only got half of that in terms of performance, that's still a significant improvement over writing unoptimized assembly code. It also is "more realistic" for some values of "more" and "realistic," as the compilers you are probably using, namely GCC and LLVM, have their front ends compile to an intermediate language. With GCC, it's GENERIC and GIMPLE (and RTL). With LLVM, it's LLVM IR. For us, it might just be QBE IR. Let's try it out, write a small program directly in QBE IR to test things out and see if we like if. If we do, I propose we write two back ends for our self-hosting compiler: we'll start with a C back end and then add a QBE back end.
Remember that this is my first go at writing anything with QBE IR. My understanding is based on reading the QBE documentation and actualizing that understanding in writing this small program. It's very possible and expected that my understanding is imperfect. If you happen to be a QBE expert, let us know how we can improve our understanding. Even with our imperfect knowledge, we will have real usable knowledge of QBE and we will be able to make a decision if we would like to target it as a back end for our self-hosting PL/0 compiler.
I find Brainfuck compilers to be a good introductory task when learning a new language: it by no means tests all the different aspects of the language but it does enough work for me to get a feel for if I like a language or not, because you can make a very simple or very complex compiler very compactly. Let's write a simple Brainfuck compiler in QBE IR and see how it goes.
I guess these things also end up as pseudo-tutorials, though I'm by no means an expect. This is the only program I've ever written directly in QBE IR. But I hope following along with me gets you to understand how I evaluate languages to solve a particular problem, and hopefully it helps demystify the QBE IR language itself. I had to start writing code to figure out what was going on but now that I have, I have to say I much like QBE IR, with perhaps minor reservations.
Comments begin with #
and extend to the end of the line.
Functions are written somewhat similar to C. Let's take a look at a direct comparison.
This function in C:
int main(void) { return 0; }
Would be written in QBE IR as:
export function w $main() { @start ret 0 }
Names of global variables must begin with $
, local labels (called blocks in QBE documentation) must begin with @
, and local temporary variables must begin with %
. Immediates are written as-is. Types must be prefixed before every variable. There are first-class types for 32-bit integers (w
), 64-bit integers (l
), single-precision floats (s
), and double-precision floats (d
). There are additional non-first-class types for bytes (b
) and half-words (h
), but in practice I only used bytes for string constants; if I wanted to use an individual byte from a string, it needed to be promoted to a w
. We need to think about QBE IR as if it were assembly, so for us on our 64-bit systems, memory locations take l
. When you directly reference a global variable in your QBE statements, you are working with the address of the global variable. If you want to use the value of the global variable, you will have to load the value into a temporary, manipulate the temporary, then store the new value of the temporary back into the global. We'll see that in action a little later.
We're going to have this Brainfuck compiler output the same style of C code as our hand-written assembly compiler. As I understand the QBE documentation, string literals need to be marked with the data
reserved word, followed by a global name (begins with $
), followed by the non-NUL
-terminated but otherwise C-style string encased in curly brackets. Because everything needs to be prefixed with a type, that C-style string must be prefixed with b
for bytes. So a string literal looks like this:
data $string = { b "This is a string.\n", b 0 }
As you can probably imagine, that , b 0
at the end is appending a NUL
-byte to the end of the string. That means you could have written this string literal as:
data $string = { b "This is a string.\n\0" }
These two different ways of writing a string literal will produce slightly different, though equivalent, assembly. On my OpenBSD/amd64 machine, the first syntax produces:
.data .balign 8 string: .ascii "This is a string.\n" .byte 0 /* end data */
While the second syntax produces:
.data .balign 8 string: .ascii "This is a string.\n\0" /* end data */
I have no arguments with either of these. Note that this is GNU assembler assembly that's being output. That's fine by me. Both GCC and Clang understand GNU assembly syntax so I'd say we're covered. And of course you can use GNU as directly as well if you'd like.
The first syntax is what I found in the QBE documentation. So that's what I used in the Brainfuck compiler. But don't feel like you have to use it. If you prefer the second syntax, that's fine. Let's write out all the strings we need. I am not going to worry about hand-optimization to the extreme. I want to do everything straightforward for today:
# Strings data $bmis = { b "bfc: bracket mismatch\n", b 0 } data $prologue = { b "char a[65536], *p=a;\nint\nmain(void)\n{\n", b 0 } data $left = { b "p-=1;\n", b 0 } data $right = { b "p+=1;\n", b 0 } data $dec = { b "*p-=1;\n", b 0 } data $inc = { b "*p+=1;\n", b 0 } data $gchar = { b "*p=getchar();\n", b 0 } data $pchar = { b "putchar(*p);\n", b 0 } data $lopen = { b "while(*p){\n", b 0 } data $lclose = { b "}\n", b 0 } data $epilogue = { b "return 0;\n}\n", b 0 }
Since we're already here, let's also add whatever global variables we might need. We definitely need one global at least: we'll need a buffer to store the character we are going to write using the write(2)
function. In our hand-written assembly, we cheated a bit and just assumed that the environment would set up a stack for us in a sane location. Let's be more routine here and make a global variable whose address we can then take. I am also going to add a depth variable as well. Since we're trying things out, I am going to use functions liberally and I might end us using the depth variable in multiple places, so a global is just the easiest to maintain. Let's add these:
# Global variables data $ch = { b 0 } data $depth = { w 0 } # Strings data $bmis = { b "bfc: bracket mismatch\n", b 0 } data $prologue = { b "char a[65536], *p=a;\nint\nmain(void)\n{\n", b 0 } data $left = { b "p-=1;\n", b 0 } data $right = { b "p+=1;\n", b 0 } data $dec = { b "*p-=1;\n", b 0 } data $inc = { b "*p+=1;\n", b 0 } data $gchar = { b "*p=getchar();\n", b 0 } data $pchar = { b "putchar(*p);\n", b 0 } data $lopen = { b "while(*p){\n", b 0 } data $lclose = { b "}\n", b 0 } data $epilogue = { b "return 0;\n}\n", b 0 }
If we process just this with QBE, we get this output:
.data .balign 8 ch: .byte 0 /* end data */ .data .balign 8 depth: .int 0 /* end data */ .data .balign 8 bmis: .ascii "bfc: bracket mismatch\n" .byte 0 /* end data */ .data .balign 8 prologue: .ascii "char a[65536], *p=a;\nint\nmain(void)\n{\n" .byte 0 /* end data */ .data .balign 8 left: .ascii "p-=1;\n" .byte 0 /* end data */ .data .balign 8 right: .ascii "p+=1;\n" .byte 0 /* end data */ .data .balign 8 dec: .ascii "*p-=1;\n" .byte 0 /* end data */ .data .balign 8 inc: .ascii "*p+=1;\n" .byte 0 /* end data */ .data .balign 8 gchar: .ascii "*p=getchar();\n" .byte 0 /* end data */ .data .balign 8 pchar: .ascii "putchar(*p);\n" .byte 0 /* end data */ .data .balign 8 lopen: .ascii "while(*p){\n" .byte 0 /* end data */ .data .balign 8 lclose: .ascii "}\n" .byte 0 /* end data */ .data .balign 8 epilogue: .ascii "return 0;\n}\n" .byte 0 /* end data */ .section .note.GNU-stack,"",@progbits
Yeah, I'm good with this. Very sensible. Let's write some logic.
Yes we can of course use printf
or puts
but we can learn a little more by doing it by hand. We only have to worry about writing to stdout
and stderr
anyhow.
Some things we need to know. First, we need to know that in assignment statements, only temporaries may be assigned to. There are load and store functions to be able to manipulate globals. The second thing we need to know is that temporaries may be non-SSA temporaries. This means we do not have to follow strict SSA form when dealing with temporaries. And it is this second piece that makes writing QBE IR so much easier than LLVM IR and I think might be the killer feature that will allow us to use QBE IR in our self-hosting PL/0 compiler.
Let's take a look at this function and talk it out:
function $dputs(l %s, w %fd) { # Static function with parameters @start @loop %ch =w loadub %s # Get byte at current string location %cmp =w ceqw %ch, 0 # Is byte NUL? Return 1 if true, 0 if false jnz %cmp, @done, @pbyte # cmp returned true ? done : pbyte @pbyte storew %ch, $ch # Put byte into write buffer call $write(w %fd, l $ch, w 1) %s =l add %s, 1 # Next character in string jmp @loop @done ret # All functions end with ret, jmp, or jnz }
Our dputs
function is a void
function that will take two parameters, a string %s
and a file descriptor %fd
. I don't remember seeing it in the QBE documentation but some trial-and-error uncovered that if you omit a type declaration between function
and the function name, you get a void
function. Even though the string is marked as a temporary, when we call dputs
we'll use a global for that parameter and everything works.
QBE IR has a little quirk in that you cannot jump to the first block in a function. You can have two blocks right on top of each other and jump to the second block. So that's what I'm doing here. We have to dereference a character from the current string location, loadub
will load an unsigned byte. Next, we check to see if that byte is 0, since if it is, we're finished with our string.
The comparison functions are regularly named: all begin with c
, then the comparator you want (eq
for equals, in our case), and all end with the type of the two sides of the comparison, w
in our case. The comparisons return 1 if true, 0 if false. That then leads directly to a conditional jump, jnz
, or jump if not 0. This is the only conditional jump available in QBE IR. There is an unconditional jump, jmp
as well.
The conditional jump takes three operands, each separated by a comma. The first operand is a variable. The second is the block to jump to if the variable is not 0. The third is the block to jump to if the variable is 0. Therefore, the comparator and conditional jump together check to see if the current byte in the string is 0. If it is (i.e. the comparison returned 1), we jump to @done
. If not, we jump to @pbyte
.
We can simplify this further:
function $dputs(l %s, w %fd) { # Static function with parameters @start @loop %ch =w loadub %s # Get byte at current string location jnz %ch, @pbyte, @done # %ch != 0 ? @pbyte : @done @pbyte storew %ch, $ch # Put byte into write buffer call $write(w %fd, l $ch, w 1) %s =l add %s, 1 # Next character in string jmp @loop @done ret # All functions end with ret, jmp, or jnz }
Instead of relying on the return value of the comparison, we can eliminate the comparison entirely and just check the byte directly.
Now we have to print the byte. Recall that the C prototype for write(2)
is ssize_t write(int, const void *, size_t);
. What I chose to do is put the value of temporary variable %ch
into the global variable $ch
and then use the global as the pointer. As I understand the QBE documentation, globals never directly pass their value, they always pass their address. You must indirectly use load and store to get values out of globals. Therefore, if we were to translate that write call back to C, it would be write(fd, &ch, 1);
. And that's what we want.
%s
as a non-SSA temporary, which we can see immediately since we use it twice in the %s =l add %s, 1
statement. Having non-SSA temporaries is incredibly easy and convenient. LLVM IR does not permit this expression because it does not allow non-SSA temporaries. And then we loop again. When we're done, we return.
I wonder what the amd64 assembly for this function looks like:
.text dputs: pushq %rbp movq %rsp, %rbp pushq %rbx pushq %r12 .Lbb1: movzbl (%rdi), %eax cmpl $0, %eax jz .Lbb3 movl %eax, ch(%rip) movl $1, %edx movl %esi, %r12d leaq ch(%rip), %rsi movq %rdi, %rbx movl %r12d, %edi callq write movl %r12d, %esi movq %rbx, %rdi addq $1, %rdi jmp .Lbb1 .Lbb3: popq %r12 popq %rbx leave ret /* end function dputs */
Yeah this is about what I would expect the assembly to look like. It might have been nice to have recognized that addq $1, %rdi
could be rewritten as incq %rdi
in this context to save a byte. But on the other hand, addq $1
and incq
are not quite equivalent, so avoiding potential pitfalls is likely wise with such a small program like QBE.
Unfortunately, QBE does not recognize that movl $0, %eax
can be improved to xorl %eax, %eax
. That would be nice to have. On my machine, if I write a small program to do nothing but return 0:
export function w $main() { @start ret 0 }
I get the following amd64 assembly:
.text .globl main main: pushq %rbp movq %rsp, %rbp movl $0, %eax leave ret /* end function main */ .section .note.GNU-stack,"",@progbits
I do wish that movl $0, %eax
was xorl %eax, %eax
. There is an interesting write-up about the topic for those interested.
Now that we have our general writing function, let's handle errors next. We only worry about errors if we have a bracket mismatch, so I will call our error function mismatch
. It doesn't need to take any parameters, since the error message will always be the same and it will always be printed to stderr
:
function $mismatch() { # Static function without parameters @start call $dputs(l $bmis, w 2) call $exit(w 1) # Can call C functions at any time ret }
The ability to directly interoperate with C is another excellent feature. Sure, we can do that with assembly as well, but this is much easier since we don't need to remember things like what values need to go in what registers. We just call the C function as if it were C, remembering to prefix each parameter with its type.
We still need that return at the end of the function. If you forget the ret
at the end, you'll be greeted with this error message: last block misses jump
. It was not immediately clear to me what QBE was trying to say. In short, QBE considers jmp
, jnz
, and ret
to be jumps and every block needs to end with one. This is explained in the QBE documentation.
The mismatch
function becomes this amd64 assembly:
.text mismatch: pushq %rbp movq %rsp, %rbp movl $2, %esi leaq bmis(%rip), %rdi callq dputs movl $1, %edi callq exit leave ret /* end function mismatch */
All looks good here.
Now it's time to write the main logic for our Brainfuck compiler. We'll just do the simple switch statement on the current character of the input file and immediately write out the C translation.
One thing to note is that the main()
function in C is not marked static
. In QBE IR, if you mark a function export
, that makes it non-static. Otherwise, it is static. That means the functions we've written so far are static functions. Also, we need to make sure that main
returns an integer, since in C the main()
function is typed as an int
. That means our boilerplate needs to look like this:
export function w $main() { @start ret 0 }
First thing's first: write out the prologue:
export function w $main() { @start call $dputs(l $prologue, w 1) ret 0 }
Now, we'd like to read in a character from stdin
into our character buffer $c
. As read(2)
returns the number of characters read, we should assign the return value and check that it is 1. If it's not 1, then we are finished:
export function w $main() { @start call $dputs(l $prologue, w 1) @loop %r =w call $read(w 0, l $ch, w 1) %cmp =w ceqw %r, 1 jnz %cmp, @maybeleft, @eof # %cmp == 1 ? @maybeleft : @eof @maybeleft @eof %d =w loadsw $depth jnz %d, @mismatch, @done @mismatch call $mismatch() ret 0 @done call $dputs(l $epilogue, w 1) ret 0 }
We must make the comparison here because read(2)
can return -1
on error and we need to go to @eof
in that case as well.
I also did the end of file bracket checking and writing out the epilogue if everything checks out. Remember that all blocks must end in jmp
, jnz
, or ret
, hence the unreachable ret 0
at the end of @mismatch
.
Alright, now let's write out a QBE translation for a switch
statement:
export function w $main() { @start call $dputs(l $prologue, w 1) @loop %r =w call $read(w 0, l $ch, w 1) %cmp =w ceqw %r, 1 jnz %cmp, @maybeleft, @eof # %cmp == 1 ? @maybeleft : @eof @maybeleft %b =w loadub $ch # Bytes must be loaded as words %cmp =w ceqw %b, 60 # < jnz %cmp, @isleft, @mayberight @isleft call $dputs(l $left, w 1) jmp @loop @mayberight %cmp =w ceqw %b, 62 # > jnz %cmp, @isright, @maybedec @isright call $dputs(l $right, w 1) jmp @loop @maybedec %cmp =w ceqw %b, 45 # - jnz %cmp, @isdec, @maybeinc @isdec call $dputs(l $dec, w 1) jmp @loop @maybeinc %cmp =w ceqw %b, 43 # + jnz %cmp, @isinc, @maybegetchar @isinc call $dputs(l $inc, w 1) jmp @loop @maybegetchar %cmp =w ceqw %b, 44 # , jnz %cmp, @isgetchar, @maybeputchar @isgetchar call $dputs(l $gchar, w 1) jmp @loop @maybeputchar %cmp =w ceqw %b, 46 # . jnz %cmp, @isputchar, @maybelopen @isputchar call $dputs(l $pchar, w 1) jmp @loop @maybelopen %cmp =w ceqw %b, 91 # [ jnz %cmp, @islopen, @maybelclose @islopen call $islopen() jmp @loop @maybelclose %cmp =w ceqw %b, 93 # ] jnz %cmp, @islclose, @loop @islclose call $islclose() jmp @loop @eof %d =w loadsw $depth jnz %d, @mismatch, @done @mismatch call $mismatch() ret 0 @done call $dputs(l $epilogue, w 1) ret 0 }
I decided to break out bracket handling to their own functions. But before we do that, let's take a look at the generated assembly for main
:
.text .globl main main: pushq %rbp movq %rsp, %rbp movl $1, %esi leaq prologue(%rip), %rdi callq dputs .Lbb1: movl $1, %edx leaq ch(%rip), %rsi movl $0, %edi callq read cmpl $1, %eax jnz .Lbb18 movzbl ch(%rip), %eax cmpl $60, %eax jz .Lbb17 cmpl $62, %eax jz .Lbb16 cmpl $45, %eax jz .Lbb15 cmpl $43, %eax jz .Lbb14 cmpl $44, %eax jz .Lbb13 cmpl $46, %eax jz .Lbb12 cmpl $91, %eax jz .Lbb11 cmpl $93, %eax jnz .Lbb1 callq islclose jmp .Lbb1 .Lbb11: callq islopen jmp .Lbb1 .Lbb12: movl $1, %esi leaq pchar(%rip), %rdi callq dputs jmp .Lbb1 .Lbb13: movl $1, %esi leaq gchar(%rip), %rdi callq dputs jmp .Lbb1 .Lbb14: movl $1, %esi leaq inc(%rip), %rdi callq dputs jmp .Lbb1 .Lbb15: movl $1, %esi leaq dec(%rip), %rdi callq dputs jmp .Lbb1 .Lbb16: movl $1, %esi leaq right(%rip), %rdi callq dputs jmp .Lbb1 .Lbb17: movl $1, %esi leaq left(%rip), %rdi callq dputs jmp .Lbb1 .Lbb18: movl depth(%rip), %eax cmpl $0, %eax jnz .Lbb20 movl $1, %esi leaq epilogue(%rip), %rdi callq dputs movl $0, %eax jmp .Lbb21 .Lbb20: callq mismatch movl $0, %eax .Lbb21: leave ret /* end function main */
It looks like QBE wrote out the switch statement assembly backwards from the way we wrote it in QBE IR, but that's fine.
QBE wrote out a movl $0, %edi
line. Again, it would have been better to have written out xorl %edi, %edi
. You save three bytes by doing so.
movl $0, %edi
encodes to:
bf 00 00 00 00
Whereas xorl %edi, %edi
encodes to:
31 ff
If we were hand writing the assembly, we might also recognize that movl $1, %esi
being placed before each dputs
call is redundant; we could set %esi
to 1 after the read
call and be done with it. But I'm sure this is because I used an immediate for the file descriptor in each dputs
call so this is a case where I could have done better and was likely being unreasonable about what QBE could do with the code I wrote.
Lastly, let's write functions to handle [
and ]
:
function $islopen() { @start %d =w loadsw $depth # Load value of global variable %d =w add %d, 1 # Because you can only assign to temporaries storew %d, $depth # Store value to global variable call $dputs(l $lopen, w 1) ret } function $islclose() { @start %d =w loadsw $depth %d =w sub %d, 1 storew %d, $depth %cmp =w ceqw %d, -1 # Is the depth level -1 ? jnz %cmp, @bad, @good # %cmp == 1 ? @bad : @good @bad call $mismatch() ret @good call $dputs(l $lclose, w 1) ret }
The comments attempt to spell out why I did what I did. In QBE, you can only assign to temporaries, so we need to load the word out of $depth
into a temporary. Then we can modify the value of the temporary and then store the new value back into $depth
. We need to do this for both [
and ]
. For [
, we're done at this point. For ]
, we need to make sure the value of $depth
has not fallen below 0, as that would signal a bracket mismatch.
And here is the assembly these functions become:
.text islopen: pushq %rbp movq %rsp, %rbp movl depth(%rip), %eax addl $1, %eax movl %eax, depth(%rip) movl $1, %esi leaq lopen(%rip), %rdi callq dputs leave ret /* end function islopen */ .text islclose: pushq %rbp movq %rsp, %rbp movl depth(%rip), %eax subl $1, %eax movl %eax, depth(%rip) cmpl $-1, %eax jz .Lbb10 movl $1, %esi leaq lclose(%rip), %rdi callq dputs jmp .Lbb11 .Lbb10: callq mismatch .Lbb11: leave ret /* end function islclose */
Perhaps it would have been nice to recognize that incl %eax
and decl %eax
could be used in these contexts, but as mentioned before it's not such a huge deal. Other than that, this all looks good.
I think even with the minor tweaks I'd like to see with QBE, it has a lot going for it that makes it an ideal option for a back end for our self-hosting PL/0 compiler. We can refine our back end progressively: we can start with a C back end to get things going and prove the self-hosting nature of the compiler, then add a QBE back end to learn about how to convert high-level languages into assembly and assembly-like languages, and finally we can add a direct to assembly back end to learn about challenges such as register allocation (which QBE solves for us).
QBE is small enough that we can place an entire copy of the source code in our repository. It is written in C so we know we can build it. And it outputs either amd64 or aarch64 assembly, so it will just work for most of us most of the time. There's a lot more that QBE is capable of that we didn't encounter, but what I've seen is enough for me to think that it has good potential for us.
It gets the nod of approval for me to use. We'll be seeing QBE again.