Abstract Heresies: REBOL

Showing posts with label REBOL. Show all posts

Friday, January 3, 2025

REBOL 1.0 Was Slow

Rebol 1.0 was slow. I paid little attention to speed in the implementation — I was concerned with correctness. The intepreter was intended to be a reference implementation, with well-defined behavior on every edge case. My intent was to add a compiler at a later date.

Once source of slowness was the liberal use of first-class continuations in the interpreter. Rebol 1.0 used a “Cheney on the MTA” interpretation strategy, where no function ever returned a value and the stack simply got deeper and deeper. When the stack overflowed, a stack garbage collection was triggered. Since most of the stack was garbage, this was a fast operation (I used a garbage collector that used time proportional to live storage). With such an implementation, first-class continuations were trivial to implement — all continuations were first-class, it was just a question of whether you surfaced them to the user. I didn’t have an ideological belief either way, but there they were, so why not? Many control flow constructs that would otherwise require an ad hoc implementation can be easily implemented with first-class continuations.

Rebol had return statements that would return control to the caller from within the function. 99% of the time, the caller is sitting on the stack just above the current frame. But 1% of the time, the user would do something weird like create a lexical closure over the return statement and pass it downward. Like as not he didn’t deliberately do this, but rather used some library that was implemented in continuation-passing style. If this happened, the return statement might have to unwind an arbitrary amount of stack. To implement this, I captured the current continuation at the entry of each function and bound it to the implicit “return” variable. Invoking return invoked the continuation and returned control to the caller. The advantage of doing it this way was that return statements had the correct semantics under all circumstances. There were no special rules governing use of return and no code had to have special cases for unexpected returns.

A similar thing happened in the implementation of break and continue in loops. These were implemented by capturing the continuation at the entry of the loop and binding it to the implicit break variable, and capturing the continuation on each iteration and binding it to the implicit continue variable. Because these were first-class continuations, they could be used to restart the loop after it exited. That wasn’t a requirement. I was perfectly happy to stipulate that break and continue only work while a loop is in progress, but in Rebol 1.0, they’d continue to work after the loop finished.

Worrying about continuations captured in lexical closures may seem weird, but it’s a real issue. It is common to introduce implicit lexical contours in a program: even a let expression does it. You would like to be able to use break and continue in the body of a let expression in a loop. Some Rebol constructs were implemented by implicitly macroexpanding the code into a call to a helper function. break and continue would work across function call boundaries, so there were no limitations on introducing helper functions within a loop.

A more traditional language has a handful of ad hoc iteration constructs that are implemented with special purpose code. The special purpose code knows it is a loop and can be optimized for this. break and continue statements have a special dependency on the enclosing loop.

Rebol 1.0 was properly tail recursive, so there was no special implementation of loops. They were ordinary functions that happened to call themselves. Non-standard iteration constructs could be created by the user by simply writing code that called itself. break and continue just surfaced the interpreter’s continuation to the user. As a consequence, loops in Rebol 1.0 were implemented completely in Rebol code but had signifcant interpreter overhead.

Rebol 2.0 and later are not properly tail recusive. As a consequence, special looping constructs are required to be written in C to support iteration. Common iteration constucts such as for and while are provided and do not have interpreter overhead, but if you want a non-standard iteration construct, there is no way to achieve it. You have to re-write your code to use one of the built-in iteration constructs or go without and risk blowing the stack.

My intent was to eventually write a compiler for Rebol. I wrote a prototype called Sherman that compiled to MIT-Scheme and was supported by the MIT-Scheme runtime library. Loops compiled with Sherman ran quickly as expected.

Tuesday, June 27, 2023

Tail recursion in REBOL

Many years ago I worked on a language called REBOL. REBOL was notable in that it used a variation of Polish notation. Function names came first, followed by the arguments in left to right order. Parentheses were generally not needed as the subexpression boundaries could be deduced from the arguments. It’s a bit complicated to explain, but pretty easy to code up.

An interpreter environment will be a lists of frames, and each frame is an association list of variable bindings.

(defun lookup (environment symbol)
  (cond ((consp environment)
         (let ((probe (assoc symbol (car environment))))
           (if probe
               (cdr probe)
               (lookup (cdr environment) symbol))))
        ((null environment) (error "Unbound variable."))
        (t (error "Bogus environment."))))

(defun extend-environment (environment formals values)
  (cons (map ’list #’cons formals values) environment))

define mutates the topmost frame of the environment.

(defun environment-define! (environment symbol value)
  (cond ((consp environment)
         (let ((probe (assoc symbol (car environment))))
           (if probe
               (setf (cdr probe) value)
               (setf (car environment) (acons symbol value (car environment))))))
        ((null environment) (error "No environment."))
        (t (error "Bogus environment."))))

We’ll use Lisp procedures to represent REBOL primitives. The initial environment will have a few built-in primitives:

(defun initial-environment ()
  (extend-environment
   nil
   ’(add
     lessp
     mult
     print
     sub
     sub1
     zerop)
   (list #’+
         #’<
         #’*
         #’print
         #’-
         #’1-
         #’zerop)))

A closure is a three-tuple

(defclass closure ()
  ((arguments :initarg :arguments :reader closure-arguments)
   (body :initarg :body :reader closure-body)
   (environment :initarg :environment :reader closure-environment)))

An applicable object is either a function or a closure.

(deftype applicable () ’(or closure function))

We need to know how many arguments a function takes. We keep a table of the argument count for the primitives

(defparameter +primitive-arity-table+ (make-hash-table :test #’eq))

(eval-when (:load-toplevel :execute)
  (setf (gethash #’* +primitive-arity-table+) 2)
  (setf (gethash #’< +primitive-arity-table+) 2)
  (setf (gethash #’+ +primitive-arity-table+) 2)
  (setf (gethash #’- +primitive-arity-table+) 2)
  (setf (gethash #’1- +primitive-arity-table+) 1)
  (setf (gethash #’print +primitive-arity-table+) 1)
  (setf (gethash #’zerop +primitive-arity-table+) 1)
  )

(defun arity (applicable)
  (etypecase applicable
    (closure (length (closure-arguments applicable)))
    (function (or (gethash applicable +primitive-arity-table+)
                  (error "Unrecognized function.")))))

REBOL-EVAL-ONE takes a list of REBOL expressions and returns two values: the value of the leftmost expression in the list, and the list of remaining expressions.

(defun rebol-eval-one (expr-list environment)
  (if (null expr-list)
      (values nil nil)
      (let ((head (car expr-list)))
        (etypecase head

          ((or number string) (values head (cdr expr-list)))

          (symbol
           (case head

             (define
              (let ((name (cadr expr-list)))
                (multiple-value-bind (value tail) (rebol-eval-one (cddr expr-list) environment)
                  (environment-define! environment name value)
                  (values name tail))))

             (if
              (multiple-value-bind (pred tail) (rebol-eval-one (cdr expr-list) environment)
                (values (rebol-eval-sequence (if (null pred)
                                                 (cadr tail)
                                                 (car tail))
                                             environment)
                        (cddr tail))))

             (lambda
                 (values (make-instance ’closure
                          :arguments (cadr expr-list)
                          :body (caddr expr-list)
                          :environment environment)
                  (cdddr expr-list)))

             (otherwise
              (let ((value (lookup environment head)))
                (if (typep value ’applicable)
                    (rebol-eval-application value (cdr expr-list) environment)
                    (values value (cdr expr-list)))))))))))

If the leftmost symbol evaluates to something applicable, we find out how many arguments are needed, gobble them up, and apply the applicable:

(defun rebol-eval-n (n expr-list environment)
  (if (zerop n)
      (values nil expr-list)
      (multiple-value-bind (value expr-list*) (rebol-eval-one expr-list environment)
        (multiple-value-bind (values* expr-list**) (rebol-eval-n (1- n) expr-list* environment)
          (values (cons value values*) expr-list**)))))

(defun rebol-eval-application (function expr-list environment)
  (multiple-value-bind (arglist expr-list*) (rebol-eval-n (arity function) expr-list environment)
    (values (rebol-apply function arglist) expr-list*)))

(defun rebol-apply (applicable arglist)
  (etypecase applicable
    (closure  (rebol-eval-sequence (closure-body applicable)
                                   (extend-environment (closure-environment applicable)
                                                       (closure-arguments applicable)
                                                       arglist)))
    (function (apply applicable arglist))))

Evaluating a sequence is simply calling rebol-eval-one over and over until you run out of expressions:

(defun rebol-eval-sequence (expr-list environment)
  (multiple-value-bind (value expr-list*) (rebol-eval-one expr-list environment)
    (if (null expr-list*)
        value
        (rebol-eval-sequence expr-list* environment))))

Let’s try it:

(defun testit ()
  (rebol-eval-sequence
   ’(
     define fib
       lambda (x)                         
        (if lessp x 2
            (x)
            (add fib sub1 x
                 fib sub x 2))

     define fact
       lambda (x)
       (if zerop x
           (1)
           (mult x fact sub1 x))

     define fact-iter
       lambda (x answer)
       (if zerop x
           (answer)
           (fact-iter sub1 x mult answer x))

     print fib 7
     print fact 6
     print fact-iter 7 1
    )
   (initial-environment)))

CL-USER> (testit)

13 
720 
5040

This little interpreter illustrates how basic REBOL evaluation works. But this interpreter doesn’t support iteration. There are no iteration special forms and tail calls are not “safe for space”. Any iteration will run out of stack for a large enough number of repetition.

We have a few options:

choose a handful of iteration specail forms like do, repeat, loop, for, while, until etc.
invent some sort of iterators
make the interpreter tail recursive (safe-for-space).

It seems a no brainer. Making the interpreter tail recursive doesn’t preclude the other two,. In fact, it makes them easier to implement.

To effectively support continuation passing style, you need tail recursion. This alone is a pretty compelling reason to support it.

But it turns out that this is easier said than done. Are you a cruel TA? Give your students this interpreter and ask them to make it tail recursive. The problem is that key recursive calls in the interpreter are not in tail position. These are easy to identify, but you’ll find that fixing them is like flattening a lump in a carpet. You’ll fix tail recursion in one place only to find your solution breaks tail recursion elsewhere.

If our interpreter is written in continuation passing style, it will be syntactically tail recursive, but it won’t be “safe for space” unless the appropriate continuations are η-reduced. If we look at the continuation passing style version of rebol-eval-sequence we’ll see a problem:

(defun rebol-eval-sequence-cps (expr-list environment cont)
  (rebol-eval-one-cps expr-list environment
    (lambda (value expr-list*)
      (if (null expr-list*)
          (funcall cont value)
          (rebol-eval-sequence-cps expr-list* environment cont)))))

We cannot η-reduce the continuation. We cannot make this “safe for space”.

But the continuation contains a conditional, and one arm of the conditional simply invokes the containing continuation, so we can η-convert this if we unwrap the conditional. We’ll do this by passing two continuations to rebol-eval-one-cps as follows

(defun rebol-eval-sequence-cps (expr-list environment cont)
  (rebol-eval-one-cps expr-list environment
     ;; first continuation
     (lambda (value expr-list*)
       (rebol-eval-sequence-cps expr-list* environment cont))
     ;; second continuation, eta converted
     cont))

rebol-eval-one-cps will call the first continuation if there are any remaining expressions, and it will call the second continuation if it is evaluating the final expression.

This intepreter, with the dual continuations to rebol-eval-one-cps, is safe for space, and it will interpret tail recursive functions without consuming unbounded stack or heap.

But we still have a bit of an implementation problem. We’re allocating an extra continuation per function call. This doesn’t break tail recursion because we discard one of the continuations almost immediately. Our continuations are not allocated and deallocated in strict stack order anymore. We cannot easily convert ths back into a stack machine implementation.

To solve this problem, I rewrote the interpreter using Henry Baker’s Cheney on the M.T.A technique where the interpreter functions were a set of C functions that tail called each other and never returned. The stack would grow until it overflowed and then we’d garbage collect it and reset it. The return addresses pushed by the C function calls were ignored. Instead, continuation structs were stack allocated. These contained function pointers to the continuation. Essentially, we would pass two retun addresses on the stack, each in its own struct. Once the interpreter figured out which continuation to invoke, it would invoke the function pointer in the struct and pass a pointer to the struct as an argument. Thus the continuation struct would act as a closure.

This technique is pretty portable and not too bad to implement, but writing continuation passing style code in portable C is tedious. Even with macros to help, there is a lot of pointer juggling.

One seredipitous advatage of an implementation like this is that first-class continuations are essentially free. Now I’m not wedded to the idea of first-class continuations, but they make it much easier to implement error handling and advanced flow control, so if you get them for free, in they go.

With it’s Polish notation, tail recursion, and first-class continuations, REBOL was described as an unholy cross between TCL and Scheme. “The result of Outsterhout and Sussman meeting in a dark alley.”

Current versions of REBOL use a simplified interpreter that does not support tail recursion or first-class continuations.