I've just posted the second part of this series. See part 2
I've been sitting on this for a long time - the oldest entries in the series I'm about to start posting is stuff I started writing back in early 2005 before I even started version 1 of this blog.
I've always enjoyed compiler technology, and I've written several toy compilers. A while back I started writing a simple compiler in Ruby, and I wrote copious notes along the way exactly with the intention of posting it somewhere. Problem was I was that by the time I was finishing up the steps I have so far, my blog was languishing, and so it's been just gathering dust.
I have about 30 parts mostly written, all the way to a simple but working compiler. It'll take me a while to post them, as at least some of them need some cleanup, and I don't have mjuch time to write, but I might post a couple of parts a week, depending on my schedule.
Without further ado, here's the first entry:
Usually, when writing an experimental compiler I've started top down. In other words, I've taken the traditional approach of designing a parser first, whether manually or using a parser generator. I've then started writing various tree transformation code to turn the AST (Abstract Syntax Tree) into symbol tables, doing error checking, annotating the tree with various information, and then finally code generation. Often after I switched to Linux I've "chickened out" and compiled to C, which is ok if your language match well with C semantics, which it often will given how low level C is.
My first compiler though, was written in M68000 assembler and output M68000 assembler.
I bootstrapped it by allowing inline assembler all the way through, and gradually adding structure:
First I'd add functions and function calls, and start making use of them to structure my compiler. Then I'd add basic arithmetics that worked on assembler registers. Then I'd add local variables etc., but parse the assembly so that the register allocator would stay clear of registers used in the assembler that was interspersed.
Gradually the compiler morphed from M68000 to my hybrid language, and less and less assembler.
I want to bootstrap a compiler again. Though I have no intention of writing it in assembler this time. But I will start at the bottom: With the code generator. And I'll have the following goals:
Sounds like a lot of shortcut and still a lot of pain, doesn't it? But it'll be fun!
So lets start with the tiniest bit of useless code:
#!/bin/env ruby
class Compiler
def compile(exp)
# Taken from gcc -S output
puts <<PROLOG
.file "bootstrap.rb"
.text
.globl main
.type main, @function
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
PROLOG
puts <<EPILOG
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
EPILOG
puts <<GLOBAL_EPILOG
.size main, .-main
GLOBAL_EPILOG
end
end
prog = [:puts,"Hello World"]
Compiler.new.compile(prog)
It does nothing. Really. It's just the result of compiling this with gcc -S:
int main()
{
}
... and breaking it into pieces.
It's a fully working compiler... Sort of.... Unfortunately it compiles any input into a noop, and so it's completely useless. But lets start with baby steps.
Consider this series of articles a "flow of consciousness". I do this for fun. I'm not about to sit down and make a nice clean design upfront. Nor am I going to be shy about changing my mind midstream and rip out code, or put stuff I previously ripped out back in again.
What you see here is my second pass: Each SVN commit I make is accompanied by notes for my posts, but the posts contain a lot of extra explanation. Sometimes that explanation makes me change my mind about things - but I won't make major changes unless I think it makes the explanations clearer or a point I made in the first pass is blatantly wrong. Even then I might leave it in, possibly with a note that I'll correct myself later. Sometimes I've contracted steps that took me a while to do but that are trivial to explain "after the fact".
Comments are very welcome, but keep in mind that while I may take your comments to heart, I have already written a lot of this series - I won't start rewriting large parts of it to take into account suggestions, but I will take notes and keep it in mind for whenever I catch up to what I've written already
Next time I'll make it actually do something with it's input, I promise.