Why can’t Pharo have constructors?

The other day we were talking about Pablo about pre-tenuring objects when discussing about a couple of papers like this one and this other one. These papers describe the idea of pre-tenuring objects for Java. The main idea being to identify allocation sites in the code (e.g., new MyClass(17, true,someObject)), take statistics about the average size and lifetime of the objects created at those allocation sites, and then decide to directly tenure objects (i.e., move them to a region of memory garbage collected less often) at those allocation sites if convenient. Now, garbage collection is actually not the point of this post, but this story serves as a kick-off for it.

So, one of the points of the papers above was to identify allocation sites. This popped a question in my mind:

Hey! How can we identify allocation sites in Pharo?

In Pharo allocations happen on the execution of the method basicNew. The method new calls basicNew to create an instance and then initializes it with the initialize method. Then, in our own classes we create our own constructors creation methods by defining class methods calling new.

Behavior >> basicNew [
  "This is a simplified version for the blog"
  <primitive: 70 error: ec>
  self primitiveFailed
]

Behavior >> new [
  ^ self basicNew initialize
]

Person class >> newInstanceWithName: aName age: anAge [
  self new
    name: aName;
    age: anAge;
    yourself
]

Now, the problem here is that from an execution perspective the real allocation point is the expression self basicNew inside the method new. But that is also the same allocation site for most objects in the entire runtime. If we want to distinguish allocation sites, we need to make them more explicit.

So let’s add constructors

This morning, coffee in hand, baby napping, I decided to give a try at implementing constructors. I wanted to avoid explicitly calling new or basicNew, avoid extra yourself sends (that are always complex to follow for new people). I have seen several people in the past doing similar stuff to avoid having getter/setter methods in their classes. They implemented instance creation methods that received dictionaries for example, used runtime reflection, and/or used long initializeMyObjectWithX:andY:thenZ: messages on the instance side.

I decided I wanted to have syntax sugar on my side :). I wanted to transform my newInstanceWithName: aName age: anAge above into something like this:

Person class >> newInstanceWithName: aName age: anAge [
  name := aName.
  age := anAge
]

And I decided to do it with a compiler plugin.

Compiler Plugins in Pharo

Pharo supports a couple of features that allow us to customize the language in sometimes funny, sometimes useful, and most of the times *very easy* ways. One of those features are compiler plugins.

Compiler plugins in Pharo are objects we can subscribe in the compiler. The compiler will call our plugin in the middle of the compilation, allowing us to insert our own AST (Abstract Syntax Tree) transformations in the compilation chain.

To define our plugin we need to define a subclass of OCCompilerASTPlugin and define the transform method.

OCCompilerASTPlugin subclass: #ConstructorPlugin
	instanceVariableNames: ''
	classVariableNames: ''
	package: 'ConstructorPlugin'

ConstructorPlugin >> transform [
  "We will work in here"
]

Our compiler plugin needs to ensure two things. First, the method we wrote as a constructor needs to be executed on an instance of the class, and not on the class itself. Second, it needs to somehow create the instance and execute that method on our instance. What we want, is that the code we wrote about is automatically (and transparently) translated into:

Person >> newInstanceWithName: aName age: anAge [
  name := aName.
  age := anAge
]

Person class >> newInstanceWithName: aName age: anAge [
  ^ self new newInstanceWithName: aName age: anAge
]

Compiling the code on the instance side

As you see above, we should be able to compile the constructor method directly on the instance side, without much manipulation. In our compiler plugin, we can ask the AST for the class where we are compiling the method, get the instance side, and simply compile it there.

ConstructorPlugin >> transform [
  | classToCompileIn |
  classToCompileIn := ast methodClass.
  classToCompileIn instanceSide compiler compile: ast sourceCode.
]

Note several things on the example above. First, I get to compile a new method by asking the AST its entire source code, this is because we cannot (for now) initiate a compilation from an AST. Second, I’m using the compiler of the class to compile, meaning the method we are compiling will not be installed in the class. This will be useful to hide the generated method as we will see later.

Generating the new real class-side method

Now that we created an instance side method to initialize our instance, we need to create the class side method that will create the instance, and call this method. I’ve decided to not install the instance-side method in the class, to keep generated code hidden (and easy to discard). So what I want to generate is now a method that takes the instance-side method and executes it on a new instance. That is, I want to generate some code like:

Person class >> newInstanceWithName: aName age: anAge [
  ^ self new
    withArgs: { aName . anAge }
    executeMethod: TheInstanceSideMethod
]

Turns out generating code in pharo can be easily done with ASTs too. And since ASTs are our communication with the compiler, we just need to replace the current AST by a new AST reflecting the code we want. We can create an AST for the method above with the following expression, where ast is the original AST:

RBMethodNode
    selector: ast selector
    arguments: ast arguments
    body: (RBSequenceNode statements: { 
        RBReturnNode value: (RBMessageNode
            receiver: (RBMessageNode
                receiver: RBSelfNode new
                selector: #new)
            selector: #'withArgs:executeMethod:'
            arguments: { 
                RBArrayNode statements: ast arguments.
                (RBLiteralValueNode value: hiddenInstanceSideMethod) })
    })

In this expression we create a new method node, with the same selector as before, the same arguments, but with a single statement. This statement is a return node, with a message send: our self new withArgs: { aName . anAge }
executeMethod: TheInstanceSideMethod
. Note also, that we are not hardcoding the arguments aName and anAge anywhere in the method. We are reusing the arguments coming from the original method, so this transformation will work for any constructors.

Putting it together

Finally, we can put the two things together in a final transform method. In this (almost) final version, I’ve added two extra details to our plugin. So far, if we installed it as-is, this plugin would have tried to compile all possible class-side methods in a class. We don’t want that, we want to scope this transformation to constructors only. In this first iteration, I decided to scope the plugin to work on methods that are marked with the <constructor> pragma and are on class-side.

transform
    | classToCompileIn hiddenInstanceSideMethod |
    classToCompileIn := ast methodClass.
    ((ast hasPragmaNamed: #constructor) not or: [ classToCompileIn isInstanceSide ])
        ifTrue: [ ^ ast ].

    hiddenInstanceSideMethod := classToCompileIn instanceSide compiler compile: ast sourceCode.
    ast := RBMethodNode
        selector: ast selector
        arguments: ast arguments
        body: (RBSequenceNode statements: { 
            RBReturnNode value: (RBMessageNode
                receiver: (RBMessageNode
                    receiver: RBSelfNode new
                    selector: #new)
                selector: #'withArgs:executeMethod:'
                arguments: { 
                    RBArrayNode statements: ast arguments.
                    (RBLiteralValueNode value: hiddenInstanceSideMethod) })
        })

Finally, we can install this plugin to work on our class by redefining the classSideCompiler method.

Person class >> classSideCompiler [
  ^ super classSideCompiler addPlugin: ConstructorPlugin
]

A rough remaining detail

If you’ve followed until here and tried the code above in a method marked as <constructor> you probably have noticed a weird behaviour. Depending on how you defined the class where your constructor is, you may have noticed different things. A pop-up asking you for undeclared variables. Or the fact that accepting that pop-up defines the variable on the class-side. This happens because the compiler performs a semantic analysis on the AST before giving the plugins a chance to do their work. In other words, it is analysing our constructor method on the class side, which does not make much sense for us.

To solve this issue, I extended the compiler plugin mechanism to call also the plugins **before** the semantic analysis. So to make this work properly, I made compiler plugins have two hooks: transformPreSemanticAnalysis and transformPostSemanticAnalysis.

The offending code in the compiler now looks like:

self pluginsDo: [ :each |
    ast := each transformPreSemanticAnalysis: ast ].
self doSemanticAnalysis.
self pluginsDo: [ :each |
    ast := each transformPostSemanticAnalysis: ast ].

What’s next?

Implementing this plugin leaves several open doors for improvement in the compiler infrastructure. I’ve made here a list of potential improvements in the compiler.

  • I should push my pre/post semantic analysis hooks to the mainstream compiler, after discussion with the compiler guys 🙂
  • Compiler plugins do only know the AST, but it would be good to get the current compiler too, for example to ask for compilation options. In our case, we had to access the compilation class from the AST, which I find confusing at the least
  • The compiler should accept ASTs as input too, otherwise going back and forth AST -> source -> AST is a waste of time and error prone.
  • Should we have a DSL to create ASTs? Maybe lisp-like macros? They could be implemented as a compiler plugin too

Also, you may have noticed that the syntax-highlighter does not realize that the variables we type are on the instance side, although the code compiles right. I was experimenting with having per-class customisable parsers to solve this issue some time ago. I think this is a nice idea to explore.

What else can we do with compiler plugins? String interpolation, literal objects? Optional arguments? Take your pick.

And what about the pre-tenuring? That’s a story for another day…

Published by Guille Polito

Pharo dev. Researcher, engineer and father. > If it ain't tested, it does not exist.

Leave a comment