Tuesday, December 31, 2024

Usual Integrations, Function Calls and Primitive Calls

Near the beginning of most MIT-Scheme files you'll see (declare (usual-integrations)). This instructs the SF program, which translates Scheme to S-code, to replace the “usual” global variables with their (compile time) values. You lose the ability to change the values of these variables, but who redefines CAR or CDR? (And if you do, just remove the declare form or add CAR and CDR to the exception list.)

Now forms such as (null? x), which used to be syntaxed into this S-code:

[funcall
   [global ’null?]
   [argument 0]]

are now syntaxed into this S-code:

[funcall
   [quote [primitive null?]]
   [argument 0]]

Recall that S-code is a tree representation of Lisp code that is walked by the interpreter. The leaves of the tree are all either quote, global, argument, or lexical, and each of these only take one or two machine instructions to eval. The “overhead” of interpretation involves getting to the leaves.

The internal nodes of an S-code tree are the compound forms: function calls and special forms. IF and PROGN (in Scheme BEGIN) are the two most frequently evaluated special forms. When we construct the S-code for a compound form, we pattern match against the subforms and specialize the S-code. Specialized S-code for a compound form is called a “superoperator”. Superoperators eliminate the eval dispatch for the subforms. The machine code for interpreting a superoperator is close to the equivalent compiled code. Let us examine some of the most important superoperators.

Funcall Argument List Elimination

In general, a function call accumulates a list of arguments and jumps to APPLY to dispatch the function. We can eliminate the overhead of doing this. First, we can eliminate the call to APPLY by having applicable objects implement the applicable interface. We use virtual function dispatch to invoke the code that implements the applicable object. Applicable objects can be applied to any number of arguments passed as a list. We can eliminate the overhead of manipulating short argument lists by spreading the arguments and adding call0, call1, call2, and call3 methods to the applicable interface. The number of arguments at any particular call site is fixed, and it is usually a small number. We eliminate the overhead of accumulating the argument list for a function call by specializing the funcall S-code object on the number of arguments.

An S-code funcall is replaced by a funcall0, funcall1, funcall2, funcall3, or funcalln depending on the number of arguments. The argument accumulation loop is unrolled in the numbered cases placing the arguments in local C# variables. The funcallx superoperators directly call the appropriate calln method in the function’s applicable interface.

Primitive Function Call

Function calls to primitive procedures are the foundation of Lisp and Scheme programs. We can look at a Lisp program as if it were a “black box” that makes a series of primitive function calls. If unoptimized, a Lisp program ought to make the same series of primitive function calls with the same values whether it is interpreted or compiled. The primitive function will run at the same speed no matter how it is called, so if compiled code is faster than interpreted, it is issuing the primitive function calls faster, taking less time between them. The time between issuing primitive function calls is the interpretation overhead, and we want to reduce that.

When we construct an S-code funcall (and funcall0, funcall1, etc.), we check if the thing being called is a literal quoted primitive function, and if it is, we replace the S-code with a primitive-funcall (or primitive-funcall0, primitive-funcall1, etc.). A primitive-funcall evaluates its arguments, but it can skip evaluating the function. It can skip the apply dispatch, too, and just jump right to the primitive entry point.

We construct (null? arg) as

[primitive-funcall1
  [quote [primitive null?]]
  [argument 0]]

The MIT-Scheme interpreter and the MIT-Scheme hardware all special case function calls with small numbers of arguments and those which call primitive procedures.

In the next post, we develop this idea beyond what MIT-Scheme already does.

Eliminating Deep Search

In general, when you look up a lexical variable, you have to do a deep search. You start with the innermost scope and work your way out until you find the variable you are looking for. This takes an absurd amount of time for each variable lookup, so we want to avoid actually doing it.

As mentioned in the previous post, we special case the two most common cases: when the variable is in the innermost scope and when the variable is in the global scope. In the other cases, when the variable is in an intermediate scope, we can flatten the scopes by copying the variable values from inner scopes to outer scopes whenever the variable is closed over. This only works because we have put the mutable variables in cells and we copy pointers to the cells.

When an S-code lambda object is created, we walk the code and find the variables that are closed over. We create a closure with a copy of the variable’s value (or a copy of its cell). In essence, we replace:

(lambda (x) (+ x y))

(assuming y is a lexical variable) with

(%make-closure ’(lambda (x) (+ x y)) y)

Once we have done this, we no longer need to deep search for lexical variables. There is only one lexical frame, it contains a copy of the variable, and we know the offset of the variable in the frame. Lexical variable lookup is now just a vector-ref.

The drawback is that we have cell-converted the mutable variables, so calling a function that contains mutable variables will allocate memory. Use of SET! makes your code expensive. Furthermore, %make-closure is linear in the number of things being closed over. But typically, closures only close over one or two items.

So between these specializations (this post and the two previous ones), we have removed the overhead of variable lookup. An expression like (lambda (x) (+ y x)) (where y is a lexical variable) is converted into S-code that looks like:

[%make-closure
  [quote (lambda ’(x)
            [funcall
              [global ’+]
              [lexical 0]
              [argument 0]])]
  [argument n]  ;; wherever ’y is in the outer scope
  ]

Evaluating variables is the bulk of the work in a Lisp program. Now both the interpreter and compiler can evaluate a variable with one or two machine instructions. Interpreted code is still slower than compiled code, but this isn’t a bottleneck anymore.

Now we look for other bottlenecks. Watch this space.

Monday, December 30, 2024

Specialized Variables

In general, when you lookup a variable, you have to do a deep search through the lexical contours. This is because someone could have captured a first class environment and evaled a define in it that shadows a variable.

However, this is a very rare case. In the vast majority of cases, the variable is either a global variable or lexical variable bound in the immediately enclosing environment. We can optimize for these cases.

Symbols refer to three different kinds of variable: if the variable is free, it refers to a global or “top-level” variable, if it is bound, it may be bound in the immediately enclosing environment (i.e., it is an argument), or it may be bound in a environment further up the lexical chain. We can determine which of these cases applies statically when we construct the S-code and select the appropriate subtype of S-code variable.

A global or top-level variable can contain a pointer to the value cell of the variable in the global environment, so variable lookup simply involves a dereference. If the variable is lexically bound, and the environment is a StatcEnvironment (or subtype), then the lexical address of the variable is known statically and cannot change (the variable cannot be shadowed), so we can store the lexical address in the S-code variable. Lookup is faster because it involves constant offsets.

The variable corresponding to argument 0 of the current lambda is by far the most commonly accessed variable, followed by argument 1. We can create special S-code types for each of these two variables in j addition to the three types for global, lexical, and the other arguments. By classifying the variables into one of these five types at syntax time, we can greatly reduce the amount of work needed at runtime to access the variable.

So consider the form (lambda (n) (lambda (x) (+ x n))). n is a lexical variable bound one level deep. x is an argument and + is a global. The S-code representation of this form will have three different variable types. The S-code for n will be a LexicalVariable with a depth of 1 and offset of 0. The eval method for LexicalVariable can walk back one level in the lexical chain and then access the zeroth element of the environment.

On the other hand, the S-code for x will be an ArgumentVariable with an argument number of 0. The eval method for ArgumentVariable can call the GetArg0 method of the environment.

Finally, the S-code for + will be a GlobalVariable with a pointer to the global value cell for +. The eval method for this type of S-code variable can simply dereference the pointer.

Between specializing environments and S-code varibles, we can greatly reduce the amount of time needed to access variables.

Specialized Environments

A Lisp program conceptually does a lookup every time it evaluates a symbol. Since evaluating symbols is the single most common operation, you want it to be as fast as possible. In compiled code, symbol values might be stored in registers, on the stack, or in a vector, but in interpreted code they are usually stored in a structure called an environment. In compiled code, the symbol value can often be retrieved by a move instruction, but in interpreted code there may be a function call and a vector-ref needed to get the value. This is a significant performance hit.

We can customize the layout of an environment to speed up symbol lookup. In the general case, an environment is a fairly heavyweight data structure. Because you can eval a define form at any time, you need an environment that supports incremental definition and you need to deep search each time you do a lookup because someone might have defined a shadowing variable.

But 99% of the time, you don’t need the general case. If no one captures a first-class environment, then you don’t need to support incremental definitions because no one can call eval to define a new variable. If no one mutates a variable, then you can store a copy of the variable’s value directly in the environment instead of indirectly through a ValueCell object. Most environments only close over a couple of variables, so you can make the variables fields of the environment object instead of storing them in a vector of values. You statically know the number of variables, so you can store them in a fixed-size class instance.

An Environment is an abstract base class.

A GlobalEnvironment is a singleton class that represents bindings in a hash table.

A StandardEnvironment is a concrete subclass of Environment that supports assignment and incremental definitions. The bindings are kept in a vector of ValueCell objects and the incremental bindings are kept in a hash table. This is a general purpose environment.

class StandardEnvironment : public Environment {
  Vector<ValueCell*> bindings;
  HashTable<ValueCell*> incrementalBindings;
};

A variable lookup in a StandardEnvironment involves fetching the vector of value bindings from the environment, looking up the variable in the vector of bindings, and if it is not found, looking it up in the hash table of incremental bindings, and finally fetching the value from the cell. Several levels of indirection are involved.

A StaticEnvironment supports assignment, but not incremental definitions. Bindings are kept in a vector of ValueCell objects.

class StaticEnvironment : public Environment {
  Vector<ValueCell*> bindings;

  Object* GetArg0() { return bindings[0].Value; }

};

Looking up a value in a StaticEnvironment is slightly quicker because there is no table of incremental bindings to search.

An ImmutableEnvironment holds bindings in a vector and does not support assignment or incremental definitions. Binding values are kept directly instead of indirectly through ValueCell objects.

class ImmutableEnvironment : public Environment {
  Vector<Object*> bindings;

  Object* GetArg0() { return bindings[0]; }

};

Looking up a variable in an ImmutableEnvironment is quicker because there are no ValueCell objects to dereference.

A SimpleEnvironment is an abstract subclass of ImmutableEnvironment that does not support assignment or incremental definition. Bindings are kept in class fields.

A SimpleEnvironment0 is a concrete subclass of SimpleEnvironment that holds no bindings — these are created when you invoke a thunk. A SimpleEnvironment1 is a concrete subclass of SimpleEnvironment that holds one binding, and so on up to SimpleEnvironment3.

  class SimpleEnvironment2 : public SimpleEnvironment {
    Object* var0;
    Object* var1;

  Object* GetArg0() { return var0; }

  };

Looking up a variable in an SimpleEnvironment is quicker because you don’t have to indirect through a vector of bindings. A method as simple as GetArg0(), which simply returns a field in the instance, can often be inlined.

Environments are created by applying closures. There are subtypes of closures that correspond to the different kinds of environments. For example, a SimpleClosure2 is a closure that creates a SimpleEnvironment2.

Closures are created by evaluating lambda expressions. There are subtypes of lambda expressions that correspond to the different kinds of closures. For example, a SimpleLambda3, when evaluated, will create a SimpleClosure3.

When S-code lambda object are created, they are analyzed to see if a first-class environment is captured, and if not, whether any of the variables are mutated. The appropriate subclass of lambda object is created based on this analysis.

Almost all environments end up being SimpleEnvironments and are quite a bit faster than the general case.

Sunday, December 29, 2024

Interpreting S-code

Lisp programs are not text but rather nested lists of symbols (atoms) and other lists. By default, a symbol represents a variable and a list represents a function call, but some lists are special forms that don’t follow the normal function call semantics.

S-code is an alternative representation for Lisp programs. Instead of nested lists, S-code is a set of nested data structures. Each S-code type corresponds to a Lisp form. There is an almost trivial isomorphism between S-code and the list representation of Lisp. S-code is a bit easier to interpret and compile than the list representation because the data structures representing the code can contain auxiliary information to aid in interpretation of the code. For example, a variable reference in S-code can contain the lexical depth and offset of the variable.

Converting from nested lists to S-code is a process called “syntaxing”. The nested lists are walked, the macros are expanded, auxiliary information is collected, and the S-code is generated. The S-code can be directly interpreted or compiled to machine code. “Unsyntaxing” is the process of converting S-code back to nested lists and it basically involves a walk of the data structure, discarding the auxiliary information, and collecting a tree of lists, thus recovering the original Lisp program (modulo macro expansion).

The MIT-Scheme hardware (Scheme 79, Scheme 81, and Scheme 86) directly executed S-code. MIT-CScheme is a C and assembly program that interprets S-code and runs compiled S-code. MzScheme and Racket also use a similar nested data structure represention of Scheme programs. I don’t think they call it “S-code”, but it is essentially the same thing. No doubt other Lisp and Scheme interpreters use this technique.

(The Lisp machine, by contrast, compiled Lisp programs to a byte-code that was interpreted by the microcoded Lisp machine hardware. I believe that MzScheme and Racket now include a JIT compiler.)

A few years back, for kicks, I wrote an S-code interpreter for Scheme. I decided to write it in C# so that I could leverage the .NET Common Language Runtime. I wouldn’t have to write a garbage collector, for example. I wrote it to interpret the S-code generated by MIT-Scheme. This way I could leverage the MIT-Scheme runtime as well. But this interpreter is not simply a port of the C implementation of MIT-Scheme. Many parts of the interpreter work completely differently from the MIT C implementation. Among them:

  • The C# call stack is used to hold the continuations. Subproblem continuations are C# continuations. Nested Scheme function calls appear as nested C# function calls, so the C# debugger can be used to debug Scheme code.
  • Tail recursion is handled by a trampoline at each function call boundary. To tail call a function, you return the function, the arguments, and a special marker to the caller’s trampoline, which will make the call on your behalf.
  • First class continuations are handled by lightweight stack inspection. In addition to a special marker for tail recursion, there is also a special marker for continuation capture. If this is returned, the caller’s trampoline will evacuate its current stack frame from the stack onto the heap and propagate the special marker up the stack.
  • Eval dispatch is handled through virtual method dispatch. Each type of S-code object has an eval method that is specialized for the type. I was curious if the virtual method dispatch was fast enough to be in the main interpreter loop.
  • The S-code interpreter is instrumented to discover the “hot” paths through the code. Certain patterns of S-code that are determined to be dynamically frequent can be replaced by S-code “superoperators” that provide specialized higher performance evaluation paths that avoid recursive evaluation steps.
  • Variable assignments are eliminated through cell conversion. All variables are ultimately immutable. Variables that were assigned to are converted to mutable cells that are allocated on the heap. This means we can use a flattened environment structure, but calls to procedures with mutable variables will cons.
  • The nested environment structure was replaced with flattened environment vectors. Shared immutable lexical variables are copied, and since mutable variables are stored in cells, their references are copied. Lexical variable lookup is constant independent of the lexical depth, but closure construction becomes linear in the number of variables being closed over.

Limitations

C# does not provide a way to dump a heap snapshot, so it isn’t possible to save a Lisp image. Instead, the interpreter is “cold loaded” each time it is started. Indeed, the main problem with the interpreter is the startup time. It spends an inordinate amount of time initializing the unicode tables.

The major version of MIT-Scheme has been incremented. The C# interpreter fails to reach the prompt when cold loading the latest version. I haven’t tracked down the problem.

The C# interpreter assumes that the S-code abstraction is completely opaque, i.e. that the Scheme code knows nothing about the binary layout of S-code objects and only manipulates them through the provided primitives. This isn’t always the case, however, so the C# intepreter sometimes has to recognize that certain idiomatic patterns of S-code are actually attempting to emulate a primitive operation. For example, system-pair-car is a primitive that extracts the first element of any system object that has two elements, i.e. objects that MIT-Scheme implements as a cons cell even though not tagged as one. But the point of using C# was to hide the representation of Scheme objects from the Scheme code, so many objects are no longer implemeted as cons cells. The interpreter notices calls to system-pair-car and instead extracts whatever field of the object that MIT-Scheme would have put in the car.

The interpreter is not complete enough to host the SF program, which creates Scheme .fasl files from Scheme source. You need MIT-Scheme to create the .fasl files, which can then be loaded into the C# interpreter.

Now what?

I got increasingly frustrated with the limitations. It was taking too long to do a full debugging cycle. Playing whack-a-mole with the bugs was getting tedious.

I wanted to understand fundamentally why interpreters are so slow. In theory, shouldn’t the interpreter be able to perform the same operations as the compiled code? What exactly is “interpreter overhead”? Shouldn’t it be able to run in the same ballpark as the compiled code? What would it take to make a high-performance interpreter?

I discovered some of the answers. A large part of the overhead comes from instruction dispatch. Compiled code has its instructions dispatched in hardware and often have dedicated hardware to read and process the instruction stream. Interpreters use software to do the dispatch. You can make the interpreter faster by combining instructions and dispatching instruction combinations. There is a combinatorical explosion of instruction combinations, however, so you only want to combine the most common instruction sequences.

I wrote a number of hand-coded instruction combinations, but it got tedious and I realized that I wanted an automatic way to generate instruction combinations for common instruction sequences. That is, I wanted a JIT compiler. I put the project on the shelf at this point because a JIT compiled S-code interpreter is another project in itself.

I decided to blog a bit about some of the more unusual features of this interpreter, so watch this space if you are interested.

Friday, December 27, 2024

Composing Binary Functions

Functions that take one argument and return one value are cool because you can make pipelines out of them. You just send the output of one function into the input of the next. Of course the output of the first function must be of a type that the second function can accept. Or you can just stick with functions that take and return the same type, and then you can compose these as desired.

But what if you want to compose binary functions that take two arguments and return two values? There are a couple of ways to take multiple arguments: you can pass them in a list (or vector), or you can pass the arguments one at a time by curry the function.

(defun curry (B)
  (lambda (l)
    (lambda (r)
      (funcall B l r))))

There are two lambdas here. The outer lambda accepts the left argument and returns the inner lambda. The inner lambda accepts the right argument and calls the original binary function with both arguments.

To call a curried function, you first call it with the left argument. This will return a lambda that you call with the right argument.

(funcall (funcall curried-binary-function left) right)

The two-argument identity function would be curried like this:

(defun curried-values (left)
  (lambda (right)
    (values left right)))

Composing two curried functions is a little complicated. Let us suppose that F and G are curried binary functions. Further assume that G has already been applied to its left argument, i.e., it is an inner lambda of a curried binary function. We want to compute the inner lambda of F composed with G.

(defun curried-compose (f g)
  (lambda (right)  ; return an inner lambda
    (multiple-value-bind (vall valr) (funcall g right)
      (funcall (funcall f vall) valr))))

Here is how it works. We return an inner lambda that accepts the right argument of the composition. We call the inner lambda G on this argument and collect the two values it returns. We then pass the two values one at a time to F.

If we did this right, we should end up with these identities:

(curried-compose #’F (curried-values left))
    ==> (F left)

and

(curried-compose #’curried-values G-inner)
    ==> G-inner

furthermore, curried-compose should be associative:

(curried-compose (curried-compose F G) H)
    ==> (curried-compose F (curried-compose G H))

Let’s try it out.

(defun curried-cons (left)
  (lambda (right)
    (cons left right)))

;; Check first identity.
(let ((x (curried-compose #’curried-cons (curried-values ’car))))
  (assert (equal (funcall x ’cdr) ’(car . cdr))))

;; Check second identity.
(let ((x (curried-compose #’curried-values (curried-cons ’car))))
  (assert (equal (funcall x ’cdr) ’(car . cdr))))

;; Check associativity.
(defun curried-sum-difference (left)
  (lambda (right)
    (values (+ left right) (- left right))))

(let ((x (curried-compose
           #’curried-cons
           (curried-compose
             #’curried-sum-difference
             (curried-values 7))))
      (y (curried-compose
           (lambda (v)
             (curried-compose #’curried-cons
               (curried-sum-difference v)))
            (curried-values 7))))
  (assert (equal (funcall x 3) (funcall y 3))))

If the binary function is closed over its right argument, then when you curry it, the inner lambda will be closed over its argument. We should be able to compose various inner lambdas to make a pipeline. The pipeline will daisy chain the right argument from one curried function in the pipeline to the next.

Pipelines like this have two properties we want to exploit: first, they force an order of execution on the functions in the pipeline. The earlier stages in the pipeline have to produce an answer before the later stages consume it, even if all the functions in the pipeline are lazy pure functional. Second, we can abstract out the argument we are daisy chaining through the stages.

Let’s use this in a real problem. Our task is to build a binary tree where each node has a serial number. The serial number will be our right hand argument which is daisy chained through the computation.

(defun next-sn (sn)
   (values sn (1+ sn)))

(defun curried-make-node (depth children)
  (curried-compose
    (lambda (sn)
      (curried-values (cons (cons depth sn) children)))
    #’next-sn))

(defun curried-make-tree (depth)
  (if (zerop depth)
      (curried-make-node depth nil)
      (curried-compose
        (lambda (left-branch)
          (curried-compose
            (lambda (right-branch)
              (curried-make-node depth (list left-branch right-branch)))
            (curried-make-tree (1- depth))))
        (curried-make-tree (1- depth)))))

curried-make-tree returns a curried function. It hasn’t built a tree, but built a function that builds the tree once it gets the serial number passed in.

Notice these two things: we have no global state or assignment, but we get a serial number that is incremented for each node. The serial number is passed around as the curried right hand argument and returned as the right hand value. Notice, too, that curried-make-tree has no mention of the serial number. It is a hidden variable.

curried-compose is reminiscent of a let expression, so we can create some analagous syntactic sugar:

(defmacro LetM (((var val) &rest more) &body body)
  ‘(curried-compose
     (lambda (,var)
       ,@(if more
            ‘((LetM ,more ,@body))
            body))
     ,val))

(defun curried-make-tree (depth)
  (if (zerop depth)
      (curried-make-node depth nil)
      (LetM ((left-branch (curried-make-tree (1- depth)))
             (right-branch (curried-make-tree (1- depth))))
        (curried-make-node depth (list left-branch right-branch)))))

There is a special name for the inner lambda that is expecting the right hand argument. It is a “monad”. curried-values is the monad “unit”, and curried-compose is the monad “bind”.

As I mentioned in a previous post, it makes more sense to use this approach in a lazy functional language. The chain of curried functions force sequential evaluation because the output of each curried function must be computed before it becomes the input of the next curried function. The right hand argument that is threaded through the curried calls can be used as a “store” that is (functionally) updated by each curried function before passing it along, giving you the illusion of mutable storage.

A monad is how you embed an imperative language within a pure, lazy, functional language. But in a mixed language like Lisp, you can force sequential evaluation by progn and the store is actually mutable, so there is less of a need for monads. Nonetheless, they provide a way to “hide” a variable that is threaded through the curried functions and that can come in handy. For example, if you are doing I/O, you can hide the stream variable so you don’t have to pass it along to every intermediate function.

Thursday, December 26, 2024

Dynamic Variables in Go

Dynamic binding is an evaluation rule for free variables. The rule is that the value of a free variable in a lambda expression is the value bound in the nearest dynamically enlosing scope. This is in contrast to static binding, where the value is the value bound in the nearest lexically enclosing scope. A naive interpreter that saves variable bindings in a stack frame will usually end up implementing dynamic binding.

In general, static binding is preferred because it is easier to reason about. However, dynamic binding is sometimes very useful in handling context. Lisp provides both static and dynamic binding, with the default being static binding.

Golang provides lexical binding, but not dynamic binding. But you can mimic the effect of dynamic binding by saving the variable's value, assigning it a new value, then restoring the old value in a defer statement. But this trick won't work in a multi-threaded environment. Each thread has to manage its own value of the variable that is separate from the values in other threads.

But what if the implementation doesn't tightly bind continuations to threads? What if the implementation uses a thread pool, and the same thread is used for different goroutines at different times? In that case, the trick of saving and restoring the value of a variable won't work. The value of the variable will be shared among different goroutines, and the value will be unpredictable. To make things work, your dynamic variables should be continuation-local.

But golang doesn't provide continuation-local variables, and it seems to go out of its way to keep code from inspecting the current continuation. There is a trick that you can use.

We need a way to allocate synthetic continuation IDs and bind them to the chain of pending calls. We also need a way that a subroutine can inspect the current chain of pending calls and determine the synthetic continuation ID.

Allocating a synthetic continuation ID is easy enough by maintaining a bitmap of free continuation IDs. When we need a continuation ID, we grab a mutex and search the bitmap for an unused continuation ID. We set the bit to allocate the ID and free the mutex. To free the continuation ID, we reset the bit. Any time you want to use dynamic variables, you need to allocate a synthetic continuation ID early on. This will be used as an index into a value table for each dynamic variable.

var (
	continuationIdInUse [MaxContinuations]bool = [MaxContinuations]bool{}
	continuationIdMutex sync.Mutex       = sync.Mutex{}
)

func AllocateContinuationId[T any](receiver func(tid uint8) T) T {
	continuationIdMutex.Lock()

	for tid, inUse := range continuationIdInUse {
		if (tid > 0) && !inUse {
			continuationIdInUse[tid] = true
			continuationIdMutex.Unlock()
			return DeallocatingContinuationId(uint8(tid), receiver)
		}
	}

	panic("No available continuation ids")
}

func DeallocateContinuationId(tid uint8) {
	continuationIdInUse[tid] = false
}

func DeallocatingContinuationId[T any](tid uint8, receiver func(tid uint8) T) T {
	defer DeallocateContinuationId(tid)
	return receiver(tid)
}

Golang doesn’t offer much in the way of stack introspection or manipulation, but you can easily get a backtrace of the names of the pending function calls. So we’ll encode the synthetic continuation ID as a nested set of function calls. When we need to read the current continuation ID, we get a backtrace of the stack and look for the special nested set of function calls and decode them.

func Mark0[T any](count int, value uint8, thunk func() T) T {
	if count == 0 {
		return thunk()
	} else if value%2 == 0 {
		return Mark0(count-1, value/2, thunk)
	} else {
		return Mark1(count-1, (value-1)/2, thunk)
	}
}

func Mark1[T any](count int, value uint8, thunk func() T) T {
	if count == 0 {
		return thunk()
	} else if value%2 == 0 {
		return Mark0(count-1, value/2, thunk)
	} else {
		return Mark1(count-1, (value-1)/2, thunk)
	}
}

func MarkByte[T any](value uint8, thunk func() T) T {
	if value%2 == 0 {
		return Mark0(7, value/2, thunk)
	} else {
		return Mark1(7, (value-1)/2, thunk)
	}
}

When we call MarkByte on a byte, we’ll make a series of eight nested function calls to Mark0 or Mark1 depending on the bits in the byte. We can use the golang backtrace mechanism to find the nested function calls and determine what the byte was:

func ReadMark() int {
	pc := make([]uintptr, 256)
	n := runtime.Callers(0, pc)

	frames := runtime.CallersFrames(pc[:n])
	value := 0
	for {
		frame, more := frames.Next()
		if !more {
			break
		}

		if strings.Contains(frame.Function, "Mark0") {
			value *= 2
		} else if strings.Contains(frame.Function, "Mark1") {
			value = value*2 + 1
		}
	}
	return value
}

Now when we spin off a goroutine that will use dynamic variables, we must be careful to call WithThread some time early on in the goroutine:

func MakeMyGoroutineEntry () func () {
        return func () {
                _ = WithThread(func () int {
                        // There are 8 pending calls to Mark[0,1]
                        // And we can use ReadMark() to get the
                        // represented value
                        
                        // ... body of callback ...
                   })
         }
}

  // ... some code somewhere ...
  myGoroutine := MakeMyGoroutineEntry()
  go myGoroutine()

Now we can use the continuation ID to index into a table of values:

type GlVal[T any] struct {
	values [MaxThreads]T
}

func NewGlVal[T any]() *GlVal[T] {
	return &GlVal[T]{}
}

We read the goroutine local variable by indexing into the table:

func GlValGet[T any](g *GlVal[T]) T {
	return g.values[ReadMark()]
}

We write the golang local variable by writing to the table. We return the old value so that we can restore it later. Note, no mutex is needed because the value is thread-local:

func GlValSet[T any](g *GlVal[T], value T) T {
	mark := ReadMark()
	oldValue := g.values[mark]
	g.values[mark] = value
	return oldValue
}

When we want to dynamically bind the variable, we save the old value to a frame local variable, set the new value, and defer a function to restore the old value:

func GlValBind[T, U any](g *GlVal[T], value T, thunk func() U) U {
	oldValue := GlValSet(g, value)
	defer GlValSet(g, oldValue)
	return thunk()
}

This technique seems reasonably robust despite using the debugging support for its implementation. It will fail if they ever make golang tail-recursive, but I doubt that will happen any time soon.

I Don’t Use Monads

I consider myself a “mostly functional” programmer. I think about programs in terms of data flow and transformations and mappings from input to output. I avoid side effects and mutable state where it isn’t necessary. But I’m not a purist. I’m happy to setf an instance slot when it makes sense.

Monads are the darling of the functional programming community. To hear it, they are the key to understanding the universe. They provide a way to encapsulate side effects and mutable state in a pure functional way.

But I don’t often use them.

There are a few reasons for this. I don’t work in Haskell, where monads are a ubiquitous part of the language. If you use Haskell, you’re going to use monads. But what about the languags that I do use? Monads of course “work” in any language that supports higher-order functions, but they are not always idiomatic or the obvious way to accomplish a task.

The primary use of monads is to mimic the look and feel of an imperative language, with its sequential execution, side effects and mutable state, in a language that is purely functional. Monads are a way to implement progn and setf in Haskell. But Lisp already has progn and setf.

And an imperative programming style is largely inferior to a functional programming style. I prefer to not think imperatively and write imperative code. I've seen many Haskell programs that used monads just so they could use do notation and write code that looked like C. That seems like a step backwards.

Monads are complicated. They essentially come down to functional composition of curried functions hidden behind some syntactic sugar. So long as you don’t have to look under the hood, they are reasonably easy to use. But when you do have to look under the hood, you’ll find a maze of twisty, turny, higher-order procedures. In Lisp, a well placed setf can cut through this functional Gordian knot.

Monday, December 16, 2024

Does CONS take its arguments in the wrong order?

Object-oriented languages often come with a “collections” library that provides a variety of data structures for storing collections of objects. Methods on collections usually pass the collection as the first argument for two reasons: (1) the collection is the primary object being operated on, and (2) in many languages, method dispatch only works on the first argument.

In Lisp, however, lists are a collection type that is derived from a more primitive data type, the cons cell. Lists are not fully abstracted from cons cells and many list operations are actually defined as operations on cons cells. Rather than the list argument always coming first, there is little rhyme or reason to the order of arguments in list operations. This becomes evident when you try to use higher-order operations on lists.

cons constructs a cons cell, with two fields, a car and a cdr. The car is traditionally written to the left of the cdr, and cons takes the car as the first argument.

In Lisp, there is no special constructor for lists. The user directly constructs the underlying cons cell and simply “declares in his head” that the result is a list. Hence when we are extending a collection with an item, the collection is the second argument to cons rather than the first. In a “modern” curly-brace, object-oriented language, we would extend mylist by writing mylist.cons(item)

Here’s where it makes a difference:

The fold-left function iterates down a list and calls a combining function on each element of the list. The first argument to the combining function is the accumulated answer, the second is the element being accumulated. If we want to accumulate into a list, the accumulated list is passed first and the new item is passed second. This means that cons is not suitable as the combining function in fold-left because the arguments come in the wrong order:

(defun bad-reverse (list)
  (fold-left #’cons ’() list)) ;; does not reverse a list

(defun xcons (cdr car)  ;; takes original list first
  (cons car cdr))

(defun reverse (list)
  (fold-left #’xcons ’() list)) ;; correcty reverses a list

Many list operations, remove and adjoin for example, take their arguments in an inconvenient order for fold-left.

Thursday, December 12, 2024

How to Blow an Interview

I recently had to interview a candidate. He had passed earlier interviews, had done well, and was in the final running. My only task was to determine his coding ability.

I checked resume, and saw that he listed Python among his computer language skills, so I decided to pitch him a Python program. I wrote a this simple skeleton of a Tic-Tac-Toe program:

class TicTacToeBoard:
    def __init__(self):
        pass

    def __str__(self):
        pass

    def make_move(self, row, col, player):
        pass

    def is_winner(self, player) -> bool:
        return False

    def is_full(self) -> bool:
        return False

if __name__ == '__main__':
    board = TicTacToeBoard()
    print(board)
    board.make_move(0, 0, 'X')
    board.make_move(0, 1, 'O')
    board.make_move(1, 1, 'X')
    board.make_move(1, 0, 'O')
    board.make_move(2, 2, 'X')
    print(board)
    print(board.is_winner('X'))
    print(board.is_winner('O'))
    print(board.is_full())

I invited him to “go to town” on the code. I didn’t care what he implemented or how he implemented it, I just wanted to watch him code.

Alas, he didn’t. He studied the code for a few minutes, moved the cursor around, and then admitted that it had been a while since he had written any significant Python code and he was a bit rusty. He was more used to Java.

That’s fine. I asked him to write the code in Java. I apologized for not having a Java skeleton, but I was sure he could write something.

He didn’t get very far. He wrote a class declaration, but was stumped on any method definitions. He admitted that he was pretty rusty in Java, too, and wanted to switch back to Python.

I didn’t care, I just wanted to see some code. He asked if the job was going to require a lot of coding. I said that it would definitely require some coding as part of the job was analyzing legacy code and identifying and fixing security bugs.

After half an hour, he still hadn’t written even a single line of working code in any language. I felt bad for him and asked him about the other experiences on his resume. They looked good, but a little digging showed that his experience was a bit shallow. I thanked him for his time and told him that the recruiter would be in touch.

I gave him a solid “no”. He just would not have been a good fit. The job required coding skills that he simply did not have.

I don’t understand it. I would have thought that the ability to write some code was a basic requirement for any job involving computers. But I guess not.


Nutanix is hiring. It’s a pretty good gig, so if you see something that interests you, apply. But be prepared to write some code.