Monday, December 30, 2024

Specialized Variables

In general, when you lookup a variable, you have to do a deep search through the lexical contours. This is because someone could have captured a first class environment and evaled a define in it that shadows a variable.

However, this is a very rare case. In the vast majority of cases, the variable is either a global variable or lexical variable bound in the immediately enclosing environment. We can optimize for these cases.

Symbols refer to three different kinds of variable: if the variable is free, it refers to a global or “top-level” variable, if it is bound, it may be bound in the immediately enclosing environment (i.e., it is an argument), or it may be bound in a environment further up the lexical chain. We can determine which of these cases applies statically when we construct the S-code and select the appropriate subtype of S-code variable.

A global or top-level variable can contain a pointer to the value cell of the variable in the global environment, so variable lookup simply involves a dereference. If the variable is lexically bound, and the environment is a StatcEnvironment (or subtype), then the lexical address of the variable is known statically and cannot change (the variable cannot be shadowed), so we can store the lexical address in the S-code variable. Lookup is faster because it involves constant offsets.

The variable corresponding to argument 0 of the current lambda is by far the most commonly accessed variable, followed by argument 1. We can create special S-code types for each of these two variables in j addition to the three types for global, lexical, and the other arguments. By classifying the variables into one of these five types at syntax time, we can greatly reduce the amount of work needed at runtime to access the variable.

So consider the form (lambda (n) (lambda (x) (+ x n))). n is a lexical variable bound one level deep. x is an argument and + is a global. The S-code representation of this form will have three different variable types. The S-code for n will be a LexicalVariable with a depth of 1 and offset of 0. The eval method for LexicalVariable can walk back one level in the lexical chain and then access the zeroth element of the environment.

On the other hand, the S-code for x will be an ArgumentVariable with an argument number of 0. The eval method for ArgumentVariable can call the GetArg0 method of the environment.

Finally, the S-code for + will be a GlobalVariable with a pointer to the global value cell for +. The eval method for this type of S-code variable can simply dereference the pointer.

Between specializing environments and S-code varibles, we can greatly reduce the amount of time needed to access variables.

No comments: