Often when I learn a new programming language that’s different enough to what I already know, the first thing I implement is an interpreter for some subset of the Scheme programming language. I’ve done Scheme implementations in OCaml, Mercury/Prolog, Forth and Unlambda, among others. Recently I had to learn JavaScript for a small, fun project about which I’ll blog some other time, but I figured that writing an interpreter would be a bit boring, so I decided to do a Scheme compiler that produces JavaScript code. And it wouldn’t be written in JavaScript, but in Scheme. And it would compile itself. And for some reason I had to call it JScreme (to be pronounced “jay scream”). I’m weird.
You can try it out with Firefox or Chrome here. Other browsers might work, too, but I haven’t tested any.
Data representation
JavaScript offers many data types very similar to the ones in Scheme, so I decided to use them to represent Scheme data directly. A Scheme number is represented by a JavaScript number, strings are represented as JavaScript strings, booleans as booleans, nil
is null
(but we also treat undefined
as nil
), vectors are arrays. JavaScript doesn’t have a data type for characters, but for my purposes it’s good enough to treat single-character strings as Scheme characters as well as Scheme strings. The most important data type, the function, can fortunately also be represented directly.
The only two Scheme data types that need special consideration are conses (or pairs, as Scheme calls them) and symbols. Conses are represented as JavaScript objects with the properties car
and cdr
. Symbols are also represented as objects, with the property symbol
set to the string that is the symbol’s name. Since Scheme symbols with the same name must not only be equal (in the sense of equal?
), but identical (in the sense of eq?
), we must make sure that only one such object exists for each symbol. This is called “interning”. We use a global dictionary, indexed by the symbol’s name, and whenever a symbol is to be created, we look it up there, or insert it if it’s new.
The compiler
Let’s look at how JScreme compiles a few simple examples:
Scheme | JavaScript |
---|---|
123 |
123 |
#\a |
"a" |
"abc" |
"abc" |
#t / #f |
true / false |
Functions are also easy:
Scheme | JavaScript |
---|---|
(lambda (x) x) |
function (x) { return x; } |
(f x) |
(f (x)) |
(apply f x y z) |
(f.apply (null, [x, y].concat (list_to_vector (z)))) |
The only rule here that merits explanation is the one for apply
. In case you’re not familiar with Scheme’s (or most other Lisp’s) apply
, here’s an equivalence:
apply |
no apply |
---|---|
(apply f 1 2 '(3 4)) |
(f 1 2 3 4) |
So, apply
takes a function, some arguments, and a list of more arguments, and calls the function with those arguments. The reason apply
is needed is because it lets you call a function with a number of arguments that is not statically known.
The function list_to_vector
above converts a Scheme list into a vector, i.e. a JavaScript array. It’s implemented in Scheme as well. We’ll get to the details later.
Obviously there must also be a way for a function to receive a variable number of arguments:
Scheme | JavaScript |
---|---|
(lambda (x . y) y) |
function (x) { var y = listify_vector (arguments, 1); return y; } |
Here, the function listify_vector
converts the arguments
array, starting at index 1
, into a Scheme list.
At this point I should mention that I’ve been lying a bit about names. Due to JavaScript’s scoping rules being slightly different than Scheme’s we must make sure that different variables have different names in JavaScript, even though they’re named the same in Scheme. The easiest way to achieve that is to use a name generator to create a unique name for each variable. In Lisps, it’s usually called gensym
. So, this is what code actually looks like:
Scheme | JavaScript |
---|---|
(lambda (x) x) |
function(__space_g8757_){return __space_g8757_;} |
Of course there’s also if
:
Scheme | JavaScript |
---|---|
(if a b c) |
(a ? b : c) |
For simplicity’s sake JScreme also implements letrec
in the compiler. It creates a new function to make scoping easier and to be able to treat the whole thing as an expression:
Scheme | JavaScript |
---|---|
(letrec ((x 1) (y x)) y) |
(function () { var x = 1; var y = x; return y; } ()) |
The toplevel
At this point we’ve covered almost all non-trivial features of the compiler. The rest of JScreme is implemented in JScreme itself as macros and functions. Macros and toplevel definitions are not handled by the compiler directly but by the toplevel handler. It looks at each toplevel item and filters out definitions and macros, which are handed to the compiler and then put into symbol tables. It also does macro expansion. Since it’s boring I won’t speak any more of it.
The library
Most of what’s in the library is also boring, so I’ll stick to the more interesting bits. Here are some fundamental parts of the Scheme language that are not implemented in the compiler but via macros:
before macroexpand |
after macroexpand |
---|---|
(begin x y z) |
((lambda () x y z)) |
(let ((x 1)) x) |
((lambda (x) x) 1) |
(let* ((x 1) (y x)) y) |
((lambda (x) ((lambda (y) ((lambda () y))) x)) 1) |
(cond (a 1) (b 2) (c 3)) |
(if a 1 (if b 2 (if c 3 #f))) |
(and a b c) |
(if a (if b c #f) #f) |
(or a b c) |
(if a #t (if b #t c)) |
Data types are next. How do we know something’s a number? Here’s the definition:
(define (number? x) (and (not (null? x)) (js-op (.. x constructor) "==" (js-quote "Number"))))
There are three tiny compiler features in this piece of code that I haven’t mentioned before. Here’s what they do:
JScreme | JavaScript |
---|---|
(.. x y) |
x.y |
(js-op x "==" y) |
x == y |
(js-quote "Number") |
Number |
With that out of the way, we can see how number?
works: If an object’s constructor is Number
, that object is a number. Since null
and undefined
aren’t objects, though, they need to be filtered out first with null?
, which, together with eq?
, is defined like so:
(define (eq? a b) (or (js-op a "===" b) (and (js-op a "==" '()) (js-op b "==" '())))) (define (null? x) (eq? x '()))
The definition of eq?
is a bit convoluted, since we want to have null
and undefined
be identical, which they are not by JavaScript’s operator ===
, so we must handle them separately.
There’s yet another new compiler feature being used in cons
:
(define (cons a b) (js-object (car a) (cdr b)))
js-object
constructs an object via JSON notation:
JScreme | JavaScript |
---|---|
(js-object (car 1) (cdr 2)) |
({car:1, cdr:2}) |
Some functions are surprisingly easy to implement:
(define (list . args) args)
I mentioned the function listify_vector
above, which is also implemented in JScreme:
(define (listify-vector vec start) (letrec ((recur (lambda (i l) (if (< i start) l (recur (js-op i "-" 1) (cons (vector-ref vec i) l)))))) (recur (js-op (vector-length vec) "-" 1) '())))
It’s basically a loop that conses up the vector’s elements, starting with the last and counting down. The reason why I had to use js-op
for the subtraction instead of Scheme’s -
is that, since -
is a variable argument function it is compiled to JavaScript code that uses listify-vector
, so we’d end up with an endless recursion.
Much of the rest of the library is not very interesting and easy to understand, so I won’t bore you with the details.
The reader
Interpreting or compiling Scheme is nice and easy if you do it in Scheme, where code is represented as data, but if you bootstrap Scheme, you must also have a parser that reads Scheme code/data from a string. In Lisps, that parser is called the reader, and JScreme has one, too, also implemented in Scheme. It’s actually completely functional, i.e., it doesn’t use side-effects, because I wrote it before I had implemented set!
in the compiler.
Limitations
To the seasoned Schemer it will not have gone unnoticed that I haven’t mentioned tail calls or continuations, yet. It will also not have gone unnoticed that the triple negations in the previous sentence and in this one don’t exactly improve the text’s clarity, but that is beside the point (which is also beside the point).
The reason for my silence on those two issues is that I’m cheating by simply implementing neither proper tail calls nor continuations (or, more specifically, call/cc
).
Since recursion is the only way to do iteration in Scheme, and I haven’t implemented any other iteration construct, unless your JavaScript implementation has proper tail calls, JScreme is fundamentally flawed. One way to remedy this is to instead of doing a tail call, return a specially marked function that does the tail call, and then wrap each function call into a loop that continues invoking those functions until something else (the proper result) is returned.
Properly implementing call/cc
would require compiling to continuation passing style. It’s not awfully complicated, but it would make JScreme even slower than it already is.
Playing around with JScreme
The sources for JScreme are hosted on GitHub. To build it you’ll need Guile. As far as I know the only Guile-specific feature JScreme is using is defmacro
is probably also available in most other Scheme implementations.
The main pieces of JScreme are the compiler in compiler.scm, the reader in reader.scm, the library in primitives.scm and the toplevel handler in toplevel.scm and toplevel-compile.scm. There’s also a test suite in tests.scm. JScreme compiles itself in jscreme.scm.
Nice work! This is more thoughtfully done than many of the *->JavaScript translators out there. You might be interested in a “direct” translator from Scheme to Java that I wrote to show students in my compilers class:
http://matt.might.net/articles/compiling-to-java/
It has exactly the same limitations: a lack of proper tail-call optimization and first-class continuations. This could be remedied with the same techniques you mentioned–trampolines and/or continuation-passing style.
OMFG!! You stole this from me.
Totally!