Implementing Indexes – Who uses all my memory

This is the second entry of the series about implementing full-text search indexes in Pharo. The first entry is Implementing Indexes – A Simple Index. You can start reading from this post, but there are things that we have presented in the previous one. For example, how to install the Trie implementation.

In the first entry, we have seen how we can create a Trie, today we are going to analyze the drawbacks of this first implementation and how to improve it to get an acceptable ratio between used space and access time.

Let’s start with a simple example, let’s create a Trie with all the class names, so we can query them.

behaviors := CTTrie new. 
SystemNavigation default allBehaviorsDo: [ :aBehavior | 
       behaviors at: aBehavior name put: aBehavior name ].

In this example, we are creating a new Trie that indexes the names of all the behaviors in the system and put into the values of the tries the name of each behavior. This looks silly, but it will be clear later, as we want to analyze the performance of the data structure, so we put the simplest payload we can have.

This simple index allows us, as we have already seen before, to find all the classes that start with a given prefix.

behaviors allValuesBeginningWith: 'T'. 
behaviors allValuesBeginningWith: 'Obj'. 
behaviors allValuesBeginningWith: 'Spt'.

Now, we are exactly where we have been before. Let’s now analyze the memory footprint of our trie and where we can improve it.

Analyzing a Graph of Objects

As we all know, the programs and data in Pharo are represented by graphs of objects. Each object has a series of references to other objects, and they collaborate to fulfill a mission. One possible, way of looking at it is a static analysis. Let’s start with a class diagram.

This diagram does not help at all, as we only can see that the trie has a node, and the nodes have a value, a collection of children and a character. This does not help to see the memory occupied by the graph.

We need to have a dynamic view of the graph.

The first approach that we can try is to use the inspector, and use all the advantages of the Pharo live environment.

This is a really powerful tool to explore and to understand data structures and complex object graphs. Although, we are facing a problem where our trie has 18.814 elements. So the task to analyze the size of the Trie with all its instances is not easy. We need another tool. Then, we are going to use the inspector, because it is so powerful to navigate the graph with tip of the fingers.

For this, we are going to analyze the statistics of a given graph of objects. We want to know:

Number of instances
How much space do they occupy
And to see how these numbers are related.

A custom tool for this particular problem

A nice way of working that you see a lot in machinists and carpenters is that they build custom tools for very specific problems. Investing a little time in a small tool that serves a particular problem provides better results than trying to apply a tool that tries to solve all problems in the world.

So we are going to use a little tool, 3 classes, that I have implemented in an hour or so. This tool takes a root object, and traverse the graph, taking all the objects reachable in the graph and then calculates some statistics.

You can get it from github.com/tesonep/spaceAndTime.

So we are going to do:

stats := GraphSpaceStatistics new
	rootObject: behaviors;
	yourself.

stats totalSizeInBytes.
stats totalInstances.
stats statisticsPerClass.	
stats statisticsPerClassCSV.

And get statistics about memory usage. With the last expression, we can generate a CSV of the results of the analysis. I got this table with the results:

Class name	# Instances	Memory (Bytes)	Memory(%)
Array	159.568	7.694.592	36.46%
CTTrieNode	159.568	5.106.176	24.20%
IdentityDictionary	159.568	3.829.632	18.15%
Association	159.567	3.829.608	18.15%
ByteString	9.242	349.872	1.66%
ByteSymbol	9.242	293.928	1.39%
CTTrie	1	16	0.00%
UndefinedObject	1	16	0.00%
Character	64	0	0.00%
SmallInteger	90	0	0.00%
Total		21.103.840	100.00%

Nice data… but we need to extract some knowledge from it.

First, we can start reducing the noise of the table, let’s analyze why we have the last four lines that do not affect the result. UndefinedObject is an easy task. This is the class of nil, and as we only have one instance of it so it is easy to identify. We can see that this single instance occupies 16 bytes, which is the smallest size an object can occupy in Pharo. The same happens with our sole instance of CTTrie.

Then we have, Character and SmallInteger, those have 0 bytes occupied even if we have a lot of instances. That should be wrong. But no, this is not an error. These objects are immediate. An immediate object is stored encoded in the reference. When you have an object with references only to immediate objects, the only occupied space is the space where the reference is. You can read more about it in Section 6.10 of Pharo by Example (Page 112).

Now, let’s see the classes that have more impact. If we see the first 4 rows in the table, we can see the number of instances of Array, CTTrieNode, IdentityDictionary, and Association. We have the same number of instances of each of them, the only difference is that we have one less for Association. So, this is not a chance, we have something there. If we inspect the nodes in the Trie, we can see that each node has an instance of IdentityDictionary to hold its children. That explains the relationship between each node and each dictionary. But the instances of Array and Association, where they came from?

Inspecting an IdentityDictionary — With the Inspector we understand how the IdentityDictionary is built

Doing a deeper analysis, using the inspector, we can see that an instance of IdentityDictionary has inside an array of associations. So we, have an array per IdentityDictionary and an Association per node. As each entry In the Array is a child node represented with an association.

If our theory is correct, where is the missing Association, as we have 159.568 nodes but one Association less. The explanation to this came when we inspect the CTTrie. The CTTrie instance has a direct reference to the root, so that explains the missing association. As the root is not contained in a Dictionary, so it is not referenced by an association.

Finally, to end our analysis, we can see the instances of ByteSymbol and ByteString. We have exactly 9.242 instances of each of them. Why? We need to remember that our trie contains exactly 18.484 entries and we are storing there the name of the classes. But the question is why some are instances of ByteSymbol and others are instances of ByteString. Again, analyzing the graph with the inspector. We see that for example, we have the following elements in the Trie: #Object and ‘Object class’. So we see that the name of the classes are Symbols and the names of the metaclasses are Strings. And of course, we know that for each class in the system we have a metaclass associated with them.

Can we do better?

One of the more interesting numbers that we have avoided so long, is the ratio of nodes and values stored. As we can see we have 18.484 values stored in the trie, but for those, we need 159.568 node instances; a ratio of 9 nodes per value. Why do we have so many nodes?

If we inspect the nodes, we see that there are lots of long chains of nodes with a single child and without node value. Again, we need to take some measures to understand the problem. We are going to measure something that we will call the occupation rate of the trie. Again, I couldn’t find this is something studied or not, but it is helpful for my problem and to optimize in the future. We defined like this:

occupationRate = (nodes - intermediateNodes) / nodes

The Nodes value is the number of nodes in the trie, and the intermediateNodes value is the number of nodes that only have a child and does not contain a value. This rate gives us the idea of how much of the trie is holding information. In our case, the value is very small 13%.

Thinking about this problem, we got to the root of it (or in this case the branches 🙂 ). The problem is how the trie is structured to store the data. Let’s see a small example with the keys “aaaaa” and “aaaab”. The resulting trie is something like the following picture

In this diagram, it is clear that all the colored nodes are internal nodes. They have only one child and they don’t hold a value. And it is clear that this structure should be compressed in way that those nodes are reduced to a single entity. With the desired result of reducing the 4 of them to a single one.

Another elephant in the room is the space occupied by the use of an IdentityDictionary to hold the children of a node. Around 72% of the memory impact is related to the decision of using an IdentityDictionary. Can it be done using a better strategy?

Conclusion

We are in a great moment of the series, it looks like we have spent a lot of time to understand the problem, but still, we don’t have anything.

This is not true, we have a lot, we have a way of measuring the performance of our solution, we understand where these numbers came from and now every step that we can take we can validate it with the original baseline we have.

The first thing to improve the performance of any solution is to measure if we don’t measure we cannot compare if we cannot compare we make modifications blindly. Optimizing without measuring is like developing without testing… and you know what we think about that.

A Floating GCC Optimization

A couple of months ago, while debugging the build of the VM, I found a problem with the optimization of GCC 8.3 (I am not sure what other versions may show it). Basically, this GCC version does not compile one of our virtual machines functions well, because it unexpectedly removes what looks like valid code. The affected code in the VM was the following:

Spur64BitMemoryManager >> fetchLong32: fieldIndex ofFloatObject: oop
    "index by word size, and return a pointer as long as the word size"
    | bits |

    (self isImmediateFloat: oop) ifFalse:
        [^self fetchLong32: fieldIndex ofObject: oop].
    bits := self smallFloatBitsOf: oop.
    ^self
        cCode: [(self cCoerceSimple: (self addressOf: bits) to: #'int *') at: fieldIndex]
        inSmalltalk: [
            self flag: #endian.
            fieldIndex = 0
                ifTrue: [bits bitAnd: 16rFFFFFFFF]
                ifFalse: [bits >> 32]]

This method is used to extract the two 32 bits words that are stored in a Float. It handles the cases when the Float number is an immediate value and when it is not. The problem is in the part that handles the immediate value. In the Smalltalk simulator, for extracting the first Int32 it performs and a #bitAnd: and a bit shift (#>>) for the second one.

As you can see, in the last part of the method, there is a conditional that handles a special case differently in the Pharo simulated code and in the translated C code. The C code was being generated form this then:

(self cCoerceSimple: (self addressOf: bits) to: #'int *')      at: fieldIndex

And the generated C code was the following:

 ((int *)(&bits))[fieldIndex];

This is accessing the memory of the bits variable using indirect memory access. Somehow, this pattern of code was marking all usages of the variable bits as dead code, and so all dead code was being removed. To detect the problem it was not easy, basically, I have used the Pharo Tests to detect the failing function and then to understand the generation of code in GCC, we need to debug its generation. After some digging, I understood this was a problem related to some undefined behavior and aliasing of variables in the stack, which you can read more about in here and here. However, in the middle I learned how to debug GCC optimizations, so the post is about that, I hope somebody finds it interesting and useful.

Starting from a simple reproducible case

I managed to create first a simpler case showing the problem:

long __attribute__ ((noinline)) myFunc(long i, int index){
  long v;
  long x = i >> 3;
  v = x;
  return ((int*)(&v))[index];
}

#include <stdio.h>

int main(){
   long i;
   int x;

   scanf("%ld", &i);
   scanf("%d", &x);

   printf("%ld",myFunc(i,x));
}</pre>

This small function is just taking a long value (64bit integer), then it shifts the value and stores it in a temporary variable (a variable that is in the stack of the function). Finally, it performs the same indirect access that we have seen in the VM code showing the error.

So, to continue the investigation let’s compile the function with different optimization levels, as bigger the number of the level bigger the number of optimizations the compiler will do.

Understanding the Level 1 generated code

When using the optimization level 1 (-01) it generates the following code:

myFunc:
    sarq $3, %rdi
    movq %rdi, -8(%rsp)
    movslq %esi, %rsi
    movslq -8(%rsp,%rsi,4), %rax
    ret

Before continuing let’s analyze this piece of code. We see only the code of the function myFunc. To understand it, we need to have some little background on how the parameters are passed in the calling convention in System V x86_64 (in this case a Linux 64bits). Basically, the register rdi is used for the first parameter, the register rsi is used for the second parameter and the register rax is used for holding the return value. The other important register here is the rsp, that holds the address to the top of the execution stack.

However, using the optimization level 2 (-02) generates a much compact version:

myFunc:
    movslq %esi, %rsi
    movslq -8(%rsp,%rsi,4), %rax
    ret

As you can see, the more optimized version is losing the bit shift and
the first mov. It keeps only the final access to the memory and the return.

Generating the snippets

The first snippet is generated using:

$ gcc -O2 prb.c -S -o prb.s -Wall

and the second using:

$ gcc -O1 prb.c -S -o prb.s -Wall

Debugging GCC steps

The GCC compiler is implemented as a pipeline of transformations. Each transformation takes the previous result and applies one of the steps of the process. When you configure different optimization levels and compilation options different pipelines are executed.

The GCC pipeline has different steps, so I centered myself in the
tree optimizations. To allow debugging of GCC, using the correct options it dumps each of the intermediate states of the compilation. The intermediate transformations work with C like trees. For example to get the optimized version of the program, before generating the Assembler we have to do:

$ gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-optimized=/dev/stdout

This will generate:

__attribute__((noinline))
myFunc (long int i, int index)
{
 long int v;
 long unsigned int _1;
 long unsigned int _2;
 int * _3;
 int _4;
 long int _8;

<bb 2> [local count: 1073741825]:
 _1 = (long unsigned int) index_7(D);
 _2 = _1 * 4;
 _3 = &v + _2;
 _4 = *_3;
 _8 = (long int) _4;
 v ={v} {CLOBBER};
 return _8;
}

This is the optimized SSA (static single assign) version of the function. As you can see in this version the code is already optimized, and our operations not correctly generated. I expected to see something like:

__attribute__((noinline))
myFunc (long int i, int index)
{
 long int x;
 long int v;
 long unsigned int _1;
 long unsigned int _2;
 int * _3;
 int _4;
 long int _10;

<bb 2> [local count: 1073741825]:
 x_6 = i_5(D) >> 3;
 v = x_6;
 _1 = (long unsigned int) index_9(D);
 _2 = _1 * 4;
 _3 = &v + _2;
 _4 = *_3;
 _10 = (long int) _4;
 v ={v} {CLOBBER};
 return _10;

}

In the first SSA version, we are lacking the shift operation:

x_6 = i_5(D) >> 3;
v = x_6;

So, we need to see which of the optimizations and transformations is losing our code. To see all the steps we should execute:

$ gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-all

This will generate a lot, really a lot, of files in the same directory with the name prb.c.xxx.yyyy. Where xxx is the number of the step, and yyyy is the name of the step. Each of the files contains the result of applying the changes, so what I have done is looking in a binary search where my code was lost.

I have found that the problem was in the first dead code elimination
step (prb.c.041t.cddce1). GCC does not like that we are copying a stack variable and then accessing directly as memory:

v = x;
return ((int*)(&v))[index];

So, basically, it was assuming that we were only using v and not x, so all the code modifying x is discarded (this optimization is called “dead store code deletion”). Basically, it tries to remove all the code that is affecting stack (temporary) variables that are never used.

Solving the problem

I have fixed the problem by writing the code in a different way. Actually, it was only necessary to remove the special C case in the original code, and let the Slang C code translator do his thing.

Spur64BitMemoryManager >> fetchLong32: fieldIndex ofFloatObject: oop
    "index by word size, and return a pointer as long as the word size"
    | bits |

    (self isImmediateFloat: oop) ifFalse:
        [^self fetchLong32: fieldIndex ofObject: oop].
    bits := self smallFloatBitsOf: oop.
    ^ fieldIndex = 0
          ifTrue: [bits bitAnd: 16rFFFFFFFF]
          ifFalse: [bits >> 32]]

Conclusion

I had published a smaller version of this post in the Pharo mailing list when commenting on the bug.

I wanted to store this knowledge and propagate it before I eventually flush it from my mind. I like these things, but I am not debugging the GCC optimization pipelines every day.

All in all, be careful about undefined behaviors!

Implementing an object-centric breakpoint with Reflectivity

In this post, we describe how we implement a breakpoint that affects only one specific object, and how we implement it using the Reflectivity framework.

What is an object-centric breakpoint?

Let’s take a simple example: imagine two Point instances p1 and p2.

p1 := 0@2.
p2 := 1@3.

Each of these points has two instance variables, x and y, that we can change by calling the setX:setY: method. Imagine that we have a bug related to point p1, and that we want to halt the execution when this object executes the setX:setY: method.

We definitely do not want to put a breakpoint directly in the setX:setY: of class Point method. Points are used all over the system: putting a breakpoint in class Point will halt whenever any point calls that method and the image will freeze.

What we really need is a breakpoint that halts the execution only when setX:setY: is called on p1.

The halt-on-call breakpoint

This breakpoint is an operator called halt-on-call, defined by Jorge Ressia in his Object-Centric debugger. It halts whenever one specific object receives a given message. This is what we need! Ideally, we would like to use an API like this:

p1 haltOnCall: #setX:setY

This installs a breakpoint that halts the method setX:setY: for the point p1 exclusively. By extension, all objects should benefit from that API, so that we can install object-centric breakpoints on any kind of object. Now let’s implement it.

Implementing the halt-on-call breakpoint API

Let’s define an interface for this operator in the Object class, to make it available for all objects. Let’s call it haltOnCall:. It takes a method selector as parameter, which defines the selector of the method to halt. We delegate the definition and the installation of the breakpoint instrumentation to another object named ObjectCentricInstrumenter:

Object >> haltOnCall: methodSelector
  ^ ObjectCentricInstrumenter new halt: methodSelector for: self

This interface installs a halt-on-call breakpoint on its receiver, and returns the object modeling that instrumentation. It is very important to keep a reference to that instrumenter object if we want to uninstall our breakpoint later. This would typically be handled by a higher level tool such as a real debugger.

This method is now our top-level debugging API, available for all objects in the system. Using this API, we can now ask any object to halt when it receives a particular message. Now, we have to create the ObjectCentricInstrumenter class and implement the halt:for: method that is used to instrument the receiver in the code above.

Implementing an object-centric instrumenter using Reflectivity

We use the Reflectivity framework as a support to implement object-centric breakpoints.

What is Reflectivity?

Reflectivity is a reflective framework shipped in the base Pharo distribution. It features annotation objects named Metalink that apply reflective operations at the sub-method level (i.e., at the level of sub-expressions of a method).

A metalink is an annotation of a method AST. It is an object that defines a message selector, a receiver named meta-object, and an optional list of arguments. At run time, when the code corresponding to the annotated AST is reached, the metalink is executed. The message corresponding to the selector is sent to the meta-object, with the previously computed argument list: the corresponding method is executed in the meta-object.

For example, adding logging to Point>>setX:setY:with Reflectivity goes as follows:

Instantiate a metalink
Define a meta-object, for example, a block:
[Transcript show: 'Hello World']
Define a message selector that will be sent to the block at run time, for example: value
Attach the metalink to the ast node of a method, for example the method setX:setY: of Point

At run time, each time a point will execute setX:setY:, the metalink will first execute and send value to the block object [Transcript show: 'Hello World'], resulting in the execution of the block. Then the execution of setX:setY: will continue.

Now, back to our original problem, Reflectivity supports object-centric metalinks. This means we can actually scope a metalink to a single, specific object. We will use this feature to implement our object-centric instrumenter and define our object-centric breakpoint.

More details about Reflectivity are available in the latest Reflectivity paper.

Implementing the object-centric instrumenter

Now, we have to create the ObjectCentricInstrumenter class and implement the halt:for: method. This class has three instance variables:

targetObject: the target object affected by instrumentation
metalink: the instrumentation per se, that is, a metalink
methodNode: the AST node of the method we instrument

Object subclass: #ObjectCentricInstrumenter
  instanceVariableNames: 'targetObject metalink methodNode'
  classVariableNames: ''
  package: 'Your-Pharo-Package'

In this class, we have to define how we install the halt-on-call breakpoint on our object. This is done through the halt:for: method. This method takes two parameters: the message selector of the method that will halt and the target object that the breakpoint will affect.

ObjectCentricInstrumenter >> halt: methodSelector for: anObject
  targetObject := anObject.
  metalink := MetaLink new
    metaObject: #object;
    selector: #halt.
  targetObject link: metalink toMethodNamed: methodSelector

First, we store the target object (line 2). We need to keep a reference to that object to uninstall the breakpoint later. Then, we configure a metalink to send the #halt message to #object (lines 3-5). At run time, #object represents the receiver of the current method that is executing. This instrumentation is equivalent to insert a self halt instruction at the beginning of the instrumented method. Finally, we use ourReflectivity’s object-centric API (line 6) to install the breakpoint metalink on the target method, but only for the target object.

When that is done, targetObject will halt whenever it receives the message methodSelector. All other objects from the system remain unaffected by that new breakpoint.

Using our object-centric breakpoint

Now that we have implemented our object-centric breakpoint, we can use it. We instrument our point p1 with an object-centric breakpoint on the setX:setY: method (line 3). We store the instrumenter object in the instrumenter variable so that we can reuse it later. Calling setX:setY: on p1 will now halt the system, while calling it on p2 or any other point will not halt.

p1 := 0@2.
p2 := 1@3.
instrumenter := p1 haltOnCall: #setX:setY:.
p1 setX: 4 setY: 2. "<- halt!"
p2 setX: 5 setY: 3. "<- no halt"

After debugging, we will probably need to uninstall our breakpoint. As we kept a reference to the instrumenter object, we can use it to change or to remove the instrumentation it defines.

Let’s first define an uninstall method in the ObjectCentricInstrumenter class. This method just calls the uninstall behavior of the metalink, removing all instrumentation from the target object.

ObjectCentricInstrumenter >> uninstall
  metalink uninstall

Our little example script now becomes:

point p1 := 0@2.
point p2 := 1@3.

instrumenter := p1 haltOnCall: #setX:setY:.
p1 setX: 4 setY: 2. "<- halt!"
p2 setX: 5 setY: 3. "<- no halt"

instrumenter uninstall.
p1 setX: 4 setY: 2. "<- no halt"
p2 setX: 5 setY: 3. "<- no halt"

More object-centric debugging tools!

We showed how to implement a breakpoint that halts when one single, specific object receives a particular message. And it was simple!

But why stopping there? The Pharo reflective tools provide us with much more power! In our next blog posts, we’ll show how we can implement a breakpoint that halts when the state of one specific object is accessed.

[Pharo features] The fusion of a developed program and IDE

In this series of blog-posts, we are explaining in more detail the Pharo features described in https://pharo.org/features.

Pharo follows several design principles, and one of them says that the interaction between the human and computer should be modeless. The interface needs to be as direct and intuitive as possible. If you try to think for a moment, what are the sources of modality in current user interfaces, you may easily overlook the worst one. Applications. The applications are giant modes, and our goal should be to avoid them.

Most developers prefer to use integrated development environments over the separate tools but, traditionally, they still have several modes of operation. You write code; then you debug it in a debugger and finally, you run the complete program. Every mode swapping is inconvenient and decreases effectivity, so the modern development environments try to blend these modes to allow one to modify the program in the debugger, change values of instance variables and so on. Some allow writing of custom extensions to help visualize the state of the program during debugging in some way. It is great for developers, but it makes the debuggers and whole development environments very complicated beasts that are hard to maintain and understand.

Pharo uses an elegant, straightforward approach. In Pharo, there isn’t any visible difference between code writing, debugging and running of a program. Pharo is almost completely modeless in this regard. There is just one program in Pharo, Pharo itself. When you start to write your own code, you are actually extending Pharo, teaching it a new thing while it runs. If you want to interrupt the program to inspect its state or perform stepping, it does not stop the program. Remember, Pharo is your program. If you would stop your program entirely, Pharo, the whole IDE, would stop. Your program is never really interrupted, which is great because it probably contains a lot of useful code. It may contain code that allows you to print your data structures to a console while debugging or maybe even to do some interactive visualization. To restart a lost connection to a database or open a form to correct broken data that caused an error.

The fact that writing a program means extending the development system means that you never start on an empty field. From the very beginning, you may use all the power and libraries that Pharo has. And when you are finished, you just cut Pharo features you do not want. You may unload the code of the development tools or directly bootstrap a new Pharo system without it, and the users of your program will not be able to use them anymore if you want.
The fusion of your program and the development environment has many positive consequences for the everyday workflow of a developer. It allows creating visualizations embedded into a debugger so you may see your data in a more accessible form. You may embed whole custom editors of your data into the debugger. You may select some expressions in the debugger window and let them inspect in a separate window. You can write the entire program in the debugger while it is being executed.

There is one feature that I particularly like, and that is extremely rare in other development environments. A feature that is very helpful and that I use on a daily basis. Everything that I told about the modeless approach is valid even for the debugger. In Pharo, a debugger is just a tool that uses exceptions and process control to hook into existing execution contexts to step them for a while using a bytecode simulator instead of letting them run at full speed. There is no single reason why there should be only one debugger. You may have dozens of opened debuggers at once and, moreover, in one debugger, you can select a piece of code and let it debug in another debugger. So you can debug a code that the running thread had not yet reached or, the other way around, has already executed.

The short video we created for this feature demonstrates it. It shows a debugger that was opened on a simple circuits designer program.

We look at the content of an instance variable that contains a collection of standalone circuit regions.
We select some of them and look at the custom visualization of these regions – this visualization with the full interactive circuit editor that displays all circuit cells that do not belong to the selected region as darker.
Then we select a piece of code in the debugger and run another debugger on top of it.
After a few steps in this new debugger, we just close it and return to the original one. There, we use the embedded circuit editor inside the debugger to actually change the circuit. It, of course, uses a lot of code of the application we are just debugging.
After this change, we continue to stepping.

This fusion of a developed program and development environment is optional. In Pharo, you still can work in the old-fashioned way where you modify in an external editor the source files and then let them load and execute in plain Pharo without any development tools. But you should not do it if you do not need to – you will lose one of the greatest advantages that Pharo provides. Moreover, the other developers do not use Pharo this way, so do not expect polished support for such workflow.

Every Pharo feature that we will mention has some disadvantages. This one is not an exception. First, it forces newcomers to learn to use the IDE that is probably significantly different from others that they know. To try to use some third-party editor and tools that the programmer probably mastered before, does not bring any advantage. That raises the primary barrier for entering the Pharo environment.

Next, the merging of the developed program with the environment makes the whole more fragile. Pharo has a very long tradition in dealing with the unwanted consequences of mistakes the programmers can do so only very small percents of issues lead to errors you cannot recover from or crashes of the entire environment. Pharo is built with presumptions that things can go wrong, so it is implausible you will lose some code you wrote without an option to recover it.

Getting a handle on names

There are only two hard things in Computer Science: cache invalidation and naming things.
Phil Karlton

Two hard things, but they are not the same kind of hard. Naming things we do it every day. And we read and re-read the things we name all day long.

This is a post about names. And ambiguity. And pain. And how to iterate on that to get something better at the end of the day.

And I’ll illustrate some ideas with the OSWindow project.

An OSWindow has a handle…

So, an OSWindow has a handle.

But let’s first start by saying what an OSWindow is. An OSWindow is an object representing an operating system window in an uniform way. Behind the scenes, an OSWindow uses some backend like WinForms in Windows or Cocoa in OSX. The main backend implementation we have in Pharo uses SDL2, a cross-platform library, mainly designed for game dev, that can be used for windowing.

So, let’s start again. An OSWindow has a handle. From the explanation above, you may think the handle is a pointer to a real window implemented in the backend framework. Actually, handle is indeed the name we use in pharo to designate a pointer to some externally managed memory region. For those used to FFI, a handle usually designates an ExternalAddress or pointer (And for those who aren’t you may take a look here).

Maybe the handle is an ExternalAddress as we use to do in uFFI?

However, regardless of our first impressions on this, reading the code shows another thing. The code manipulating this handle does not use it as an ExternalAddress, because the only thing we can do with external addresses is to read or write them (or operate with them, they are addresses after all). For example, let’s take the following piece of code from the OSWindow class:

handle show

or even:

handle newOpenGLRenderer

So the handle is finally not an FFI thingy as we thought! It’s more like a backend object representing the real window. Of course, the different native windows are manipulated through different sets of functions. And it seems it was decided to manage these differences as a composition of objects instead of inheritance, which we will not discuss nor argue today. Today we focus on names.

It turns out that these backend objects are also called handles. We have for example OSSDL2WindowHandle, OSVMWindowHandle, OSNullWindowHandle. This seems consistent with what we saw before. So, an OSWindow actually has an OSWindowHandle and not an FFI handle. I think we now understand enough to keep reading more code…

Let’s now dive in the OSSDL2WindowHandle class.

WAT? An OSSDL2WindowHandle also has a handle!

Wait! What!? WAT? The OSSDL2WindowHandle has another handle. But this cannot be an OSWindowHandle, right?

Is the handle of the handle a handle?

Or maybe now we did arrive to an ExternalAddress finally.

Or maybe the handle is an external address (an FFI handle?)

Again, reading the code tells us it is neither… For example the following pieces of code sending messages to the handle that are implemented in none of the classes we expect:

OSSDL2WindowHandle >> toggleBorderOn [
  handle toggleBorder: true.
]

OSSDL2WindowHandle >> getWMInfo [
  | wmInfo |
  wmInfo := SDL_SysWMinfo new version: SDL_Version bindingVersion.
  handle getWMInfo: wmInfo.
  ^ wmInfo
]

Following a bit the code, we can see this second handle is actually an SDL2Window implemented as an FFIExternalObject. And then I, used to uFFI conventions, know that FFIExternalObjects have a handle themselves, the real handle we were looking for from the beginning.

The SDL2Window is an FFIExtenralObject has a handle, an ExtenralAddress, finally

So the handle of the handle is an external object, which has a handle. the handle has a handle with a handle. What could go wrong here? 🙂

The full picture , to get a handle on the handles.

I don’t know you, but I felt pretty lost when reading this code, because of this overloaded terminology. The same name is used to designate three different things. And those different things are in overlapping domains! So when reading handle we need to actively think where we are and what it is designating here and now. And for worse, this problem of overloaded terminology does not stop at the name handle, I’ve found also that window could designate an OSWindow or an OSWindowHandle. So what do you think the following expression returns?

window handle

Take your pick 🙂

Getting the names on the handles

I decided to fix this name confusiology in this Pull Request by doing a couple of renames. I chose new names, that you may not agree with btw, following a couple of guidelines:

avoid ambiguity: three different names for three different non-interchangeable concepts
use generic names when possible, but staying in the domain terminology
use concrete names when variables will only reference objects of a single type/class

Reading the code and understanding the domain helped a bit in choosing the names, which now make OSWindow domain read as:

An OSWindow has a backend window, instance of OSBackendWindow. An OSSDL2BackendWindow has an sdl2Window (no need to say the class ;)). Moreover, all objects designate OSWindows as window, and OSBackendWindow as backend windows .

The final overview after renaming

Name the next step!

Do you agree with the names I choose? Please share yours! Discussing names is the only way to find better ones :). And we can always iterate and improve more. No design should be carved in stone.

In the meantime, I’ll keep improving OSWindow, we should have good support for multiple native windows soon.

I hope you enjoyed and that you give good names in your programs. For your future self. Or at least for me, if I have to read your code 🙂

Implementing Indexes – A Simple Index

In Pharo, we have an excellent tool to query, navigate and discover classes and methods in the system. The tool is Spotter, this tool allows us to search the image for different elements:

Classes
Methods
Menu entries
Help topics
and many more…

To access to it is as easy as doing Shift + Enter,

Spotter gives us access to all the classes in the system, giving us the power to learn, discover and increase our productivity.

However, it has a problem. If the image is a large image (it contains a couple of millions of objects and/or thousand of classes — one company has just 30 millions line of code) the lookup of the elements is slow. We need to implement a text search index.

A simple “Begins With” index

Spotter allows us to do full-text searches, looking up the text we want (the needle) in any part of the system. Although, we are going to start with an easier index, one that allows us to look for the entries that begin with the needle.

If we look up for Obj, the index will answer for example Object, and ObjectLayout; but it will not include DisplayableObject. We will see how to implement a full-text search and even fuzzy logic searches in future entries!

For implementing the index we are going to build a Trie. From Wikipedia:

A trie also called a digital tree or prefix tree, is a kind of search tree—an ordered tree data structure used to store a dynamic set or associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string. Keys tend to be associated with leaves, though some inner nodes may correspond to keys of interest. Hence, keys are not necessarily associated with every node [Full Article].

In this case, we are going to start with the implementation that is available in Github Pharo-Containers/Containers-Trie. We thanks Benoit St-Jean for his implementation.

This implementation is based on the first iteration of Benoît St-Jean.

To install it is as easy as executing the following Metacello expression:

Metacello new
      baseline: 'ContainersTrie';
      repository: 'github://pharo-containers/Containers-Trie/src';
      load.

Once installed we will have the CTTrie class to create a new trie and use it.

Creating a new Trie is as easy as creating a new instance of CTTrie

aNewTrie := CTTrie new

Once we have an instance we can start to add elements in it and then performing a search on it. The instances of CTTrie understand a series of messages to access, add and update the nodes in the trie.

A trie is a collection of associations. Each association has a key and a value. The Key should always be a String. The value can be of any type.

at: aKey – returns the stored value for this key, throws an error if not found.
at: aKey put: aValue – store the association of the key and the value in the trie.
at: aKey update: aBlock – updates the existing element in the trie with the result of executing the block passed as parameter. The block has a single argument that is the previous value associated with the key.
at: aKey update: anUpdateBlock initial: anInitialBlock – if the trie contains a value associated with the key, the block anUpdateBlock is used to update the value receiving as optional argument the previous value. If the trie does not contain an association for the key, a new association is created with the result of executing anInitialBlock.
removeKey: aKey – removes the association with the given key from the trie, throws an error if not found
removeKey: aKey ifAbsent: aBlock – removes the association with the given key from the trie. If the key is not found, the block is executed.

These methods allow us to access and update the trie using the full key.

As an example:

aTrie := CTTrie new.
aTrie at: #cat put: 'Meow'.
aTrie at: #dog put: 'Guau'.
aTrie at: #cat.
aTrie at: #fox put: '???'.
aTrie at: #fox.
aTrie at: #fox update: [ 'What does the fox say?' ].
aTrie at: #fox

As far as we are, the Trie is not more powerful than the Dictionary instances in Pharo. So let’s see, which is the API to do queries to the trie.

allValues – it returns a collection with all the values stored in the trie.
allValuesBeginningWith: aString – it returns all the values that are stored in the trie and which keys start with the string passed as parameter.
withAllValuesDo: aBlock – it iterates all the trie and executes the block on each of the values in the trie. This method does not create the intermediate collection, so it is faster than using allValues and then iterating the collection.
withAllValuesBeginningWith: aString do: aBlock – it iterates all the elements that have a key starting with the passed string and executes the block on each of them. This method does not create the intermediate collection, so it is faster than using allValuesBeginningWith: and then iterating the collection.

So now, we are able to query the trie and get all the elements that have keys that begin with a given string.

Let’s index all the classes in the system.

"Let's create a new Trie"
behaviors := CTTrie new.

"We will iterate all the behaviors (classes and traits) and store
them in the trie with their name as key"
SystemNavigation default allBehaviorsDo: [ :aBehavior | 
	behaviors at: aBehavior name put: aBehavior].

"We can access the trie to make queries!"
behaviors at: #Object.
behaviors allValuesBeginningWith: 'Obj'.

What is the advantage?

Using a trie provides a fast way of performing the lookup of keys including a given prefix. Without the use of a trie, we need to iterate the full collection to compare each of the keys (for example using the message beginsWith: that is present in String).

This operation is very expensive when we have thousands and thousands of elements to iterate, in the case of Spotter, thousands and thousands of methods and classes.

Using a trie the lookup effort does not depend on the size of the indexed collection but in the size of the prefix that we are looking for. More precisely, looking up data in a trie is faster in the worst case, O(m) time (where m is the length of a search string).

How is it implemented?

A trie is an unbalanced tree. We have implemented the trie using two classes CTTrie and CTTrieNode.

CTTrie is the frontend of the implementation and it is used by the user to access the associations in it. It contains a single node that is the root of the tree. This class implements all the messages that the user wants to use and hides the implementation details in the nodes of the tree.

Each node of the tree is an instance of CTTrieNode, and each of these instances has:

The node value
The children of the node
And the first character of the subtree that this node starts.

In a Trie, the key of an entry is encoded in the path from the root to the node containing the value.

In the following picture, we can see an example. The code to generate this example is the following:

aTrie := CTTrie new
aTrie at:'bat' put:'bat'.
aTrie at:'cat' put:'cat'.
aTrie at:'cow' put:'cow'.

trie

This example has 3 entries, for the clarity of the example, the keys and the values are the same, but you can store anything in the values.

As we can see from the picture, the keys of the associations are encoded in the path from the root of the tree. For the key bat, we start looking up the first letter of the string to select the path to get the answer, then the second letter and finally the third one.

The same lookup procedure can be executed to find the keys cow and cat.

If we perform the lookup of a key and we do not end in a leaf of the tree (a leaf node is one node with a value), we have found that the key is not present in the trie.

If we want to have all the entries for a given prefix, we will get a subtree. In this case if we want all the keys starting with $c we will get all the right subtree.

With this structure, it is only needed to perform n accesses to memory to get to the result, where n is the length of the prefix or key that we are looking for.

This is just a small introduction, for better understanding, why don’t you try with the Pharo implementation, adding elements to it and inspecting the structure of the resulting trie.

Still, we have problems…

The implementation we have just seen presents two problems:

It is not space-efficient, as a lot of nodes are creating and many of them do not contain nor children nor values.
It does not allow us to perform searches in the middle of the keys.

We will address both of these problems in the next entry of this series. Where we are going to see the optimized version of this trie and how to implement a suffix trie.

In the meanwhile, you can read the implementation, check the tests and understand how it is implemented. It is a really straight forward implementation, and why not… start reading the implementation of the optimized version!!!

Complishon: efficient heuristics for code completion

A couple of weeks ago I started prototyping a new code completion backend. The one available in Pharo 8.0 is good-enough to complete variables and messages to literal objects. However, the accuracy of the completion was pretty low because auto completion candidates were sorted alphabetically, and this was even producing completion pauses for large images. This code completion prototype engine, that I originally baptised as Complishon, was based on two ideas: generators and heuristics.

Generators are objects that provide a stream-like interface to consume values generated sequentially. Better with an example.

stream := Generator on: [ :generator |
  generator yield: 1.
  generator yield: 'hello'.
].
stream next.
>>> 1
stream next.
>>> 'hello'

One cool thing about generators is that the generator code will block until we call next. This is particularly interesting when iterating several big collections: we only really iterate them if we need them. And they come by default in Pharo :).

Heuristics, on the other hand, are not funky Pharo objects that come in a library. Heuristics are objects that represent rules. In our case rules for auto completion. Here you have some example of heuristics:

chances are I want to autocomplete first the methods defined in my package
chances are that a variable called anOrderedCollection will have effectively an OrderedCollection instance

And so on.

If we encode enough of these little rules on objects, we could have a smart enough auto-completion engine that does not require complex algorithms or machine learning models. Not saying that these may not be good, just saying that I wanted a quick solution to make our life easier.

This post talks about the architecture inside the completion engine. What parts are inside it and how they relate. Turns out this prototype grew up and is making it to the big leagues in Pharo 9.0, where it will be called HeuristicsCompletion.

An architecture for lazy code completion

The heuristics completion engine is done out of three main components:

lazy fetchers implemented using generators,
a completion object that works as a result-set
and several completion heuristics that are chosen from the current AST node.

Fetchers

Fetchers are subclasses of CoFetcher. They need to define the method #entriesDo: aBlock iterating a collection. The fetcher framework will take care of creating a streaming API for it using generators. For example, a simple fetcher can be created and used with:

CoFetcher subclass: #MyFetcher
	instanceVariableNames: ''
	classVariableNames: ''
	package: 'Complishon-Core'
  
MyFetcher >> entriesDo: aBlock [
  1 to: 10000000000 do: aBlock
]

MyFetcher new next >>> 1.

Basic Fetchers

A simple generic fetcher works on a collection.

CoCollectionFetcher onCollection: #( a b b a c )

And an empty fetcher is useful to provide no results

CoEmptyFetcher new

Fetcher combinators

Fetchers can be combined by using simple combinators. A sequence of fetchers, created with the message #,, fetches first the elements in the first fetcher, then in the second one. This also means that the second fetcher will never be executed until the first one is completely consumed, which is an interesting property if it is expensive to calculate.

CoInstanceVariableFetcher new, CoGlobalVariableFetcher new

A filter fetcher, created with the message #select:, decorates another fetcher and filters it with a block.

CoInstanceVariableFetcher new select: [ :e | "a condition..." ]

A no-repetition fetcher, created with the message #withoutRepetition, decorates another fetcher and filters those elements that were already returned previously by himself.

aFetcher withoutRepetition

The Juice: Fetchers for code completion

In addition, the engine provides fetchers to iterate the following:

CoInstanceVariableFetcher: instance variables of a class
CoClassVariableFetcher: class variables of a class
CoClassImplementedMessagesFetcher: messages implemented by a class
CoGlobalVariableFetcher: global variables in an environment
CoMethodVariableFetcher: variables accessible to an AST node (in order)
CoPackageImplementedMessagesFetcher: messages sent in a package

For code completion, another combinator shows useful to iterate a fetcher up a class hierarchy: CoHierarchyFetcher. This hierarchy fetcher decorates a ClassBasedComplishonFetcher (i.e., a CoClassImplementedMessagesFetcher, a CoInstanceVariableFetcher or a CoClassImplementedMessagesFetcher), and can be created with the message #forHierarchy.

Completion ResultSet

The CoResultSet object stores a fetcher and lazily fetches and internally store objects on demand:

c := CoResultSet fetcher: (CoInstanceVariableFetcher new
    completionClass: aClass;
    yourself).

"The first time it will fetch 20 elements from the fetcher and store them"
c first: 20.

"The second time it will just return the already stored elements"
c first: 20.

"Now it will fetch some more objects to have its 25"
c first: 25.

The CoResultSet object allows one to:

explicit fetch using fetch: and fetchAll
retrieve the results using first, first: and results
filter those results using filterByString:
query it with hasMoreElements and notEmpty

Heuristics

When the autocompletion is invoked, it calculates how to autocomplete given the AST node where the caret is in the text editor. The following piece of code shows examples of heuristics implemented in the prototype.

Heuristics for messages

If the AST node is a message whose receiver is self, autocomplete all messages implemented in the hierarchy.

(CoClassImplementedMessagesFetcher new
    completionClass: aClass;
    forHierarchy) withoutRepetition

If the AST node is a message whose receiver is super, autocomplete all messages implemented in the hierarchy starting from the superclass.

(CoClassImplementedMessagesFetcher new
    completionClass: aClass superclass;
    forHierarchy) withoutRepetition

If the AST node is a message whose receiver is a class name like OrderedCollection, autocomplete all messages in the class-side hierarchy of that class.

(ClassImplementedMessagesComplishonFetcher new
    completionClass: (completionContext environmentAt: aRBMessageNode receiver name) classSide;
    forHierarchy) withoutRepetition

If the AST node is a message whose receiver is a variable that has type information in its name name, like anOrderedCollection, autocomplete all messages in the instance-side hierarchy of guessed class. Then continue with normal completion. There are two cases: variables starting with a such as aPoint and variables starting with an such as anASTCache.

completionContext
  environmentAt: aRBMessageNode receiver name allButFirst asSymbol
  ifPresent: [ :class |
    ^ (ClassImplementedMessagesComplishonFetcher new
        completionClass: class;
        forHierarchy), PackageImplementedMessagesComplishonFetcher new  ].

completionContext
  environmentAt: (aRBMessageNode receiver name allButFirst: 2) asSymbol
  ifPresent: [ :class |
    ^ (ClassImplementedMessagesComplishonFetcher new
        completionClass: class;
        forHierarchy), PackageImplementedMessagesComplishonFetcher new  ]

If all the above fail, autocomplete all messages used in the current package. Chances are we are going to send them again.

 ^ CoPackageImplementedMessagesFetcher new
    	complishonPackage: aPackage;
	yourself

Heuristics for Method signature

When autocompleting the name of a new method, chances are we want to override a method in the hierarchy, or to reimplement a method polymorphic with the ones existing in the current package.

self newSuperMessageInHierarchyFetcher,
    (CoPackageImplementedMessagesFetcher new
        complishonPackage: aPackage;
	yourself)
            withoutRepetition

Heuristics for variables

Variables accessed by an instance are first the ones in the method, then the ones in the hierarchy.

instanceAccessible := (CoMethodVariableFetcher new
        complishonASTNode: anAST;
	yourself),
            (CoInstanceVariableFetcher new
                completionClass: aClass)
			forHierarchy ].

Then, variables accessed by an instance are also the ones in the class variables and globals.

globallyAccessible := (CoClassVariableFetcher new
    completionClass: complishonContext complishonClass)
        forHierarchy,
            (CoGlobalVariableFetcher new
	        complishonEnvironment: anEnvironment;
		yourself).

What’s next?

Try it in the lastest Pharo 9.0 dev build!

We would like to have new ideas for heuristics.

For example, I’ve started integrating it with a fast type inferencer that looks at initialize methods, or with a fast type inferencer like roel typer which latest version for Pharo can be found on github.

Why not machine learning?

…

Statistical models?

…

How a modal window could break the event cycle: A debugging story

A couple of weeks ago I started an aggressive cleaning on Morphic, the graphics framework we currently use in Pharo, searching for a cheap and simple path to move the Pharo IDE to HiDPI screens. The cleanups involved mostly the removal of global state. For example, I transformed code like the following:

ActiveHand doSomething.

into something like:

self activeHand doSomething.

Morph >> activeHand [
  ^ self activeWorld activeHand
]

Turns out in the middle I generated (or actually unveiled?) a couple of bugs :). Yet, I was actually surprised that I did not generate MORE bugs. So this post is a story about debugging. Because I think we don’t discuss debugging enough, and inexperienced developers often have problems to find good strategies to debug.

The bug(z)

About two weeks ago I did two pull requests to Pharo cleaning a bit the mess on Morphic’s global state. These were PR 5916 and PR 5926. A week later a couple of bugs started to pop up. This story is about issue 5945 and issue 6003.

The insect in question was showing itself as a problem managing modal windows and events. When trying to rename a method protocol a pop-up opened up, and then no matter where you clicked it kept in popping-up again and again and again.

Original screenshot showing the problem by @Myroslava.
Taken from issue https://github.com/pharo-project/pharo/issues/5945

Finding the bug in the Dark

If you follow the discussion in the issue, you’ll find my first approach for it was to try reproduce it. And I couldn’t. Having an unreproducible bug is kind of a hazzle, because it makes me ask lots of questions, and not necessarily in the good direction. Is the bug dependent on @Myroslava’s environment? Is it Windows? Because I was on OSX. Was it that she was using an old(ish) Pharo version? Because I was in the latest dev one. A couple of ping-pong of questions gave me the information I needed, but I felt that this kind of information is needed for all bugs, and it’s really a time saver when you’re chasing a nasty cochroach.

That day (was it that day? I don’t remember, but allow me take some poetical licence here) I set-up a couple of github’s issue templates. And I’m happy a couple of weeks later to see that issue reports have got better 🙂

Now, coming back to the story, I had to explore under the carpet to find the bug, and I searched in many different places. First I tried comparing the behavior between the button in the status bar and the context menu. Then I tried to compare the behavior between the calypso browser and the message browser. Then within calypso, I compared the class search dialog with the offending dialog. This looked like a good approach, because the class search dialog was not showing the issue. But it turned out a dead end, because both dialogs are implemented in a **completely different* way… In the middle of the exploration, I saw the bug a couple of times, but I could not grasp what was causing it in some cases, and not in others.

I needed a reliable way to trigger it.

The epiphany

Then I got the “House MD” moment. The “Ah” that pops up in your head. But no vicodin involved. It turned out, that the bug only appeared from the control in the status bar.

And that control has three sub controls. A delete icon button, an edit icon button, and a label. The offending popup was being created when clicking the edit icon button, and when clicking on the label. The bug always happened when clicking on the label. Even more! The bug only happened when clicking on the label.

Gotcha!

A strategy to quickly catch the bug

Now I knew how to make the mouse go out of the wall. However, what we were seeing until now was just a bug symptom. I was not able to tell for sure if this bug was specific to this specific control in the calypso browser that was unveiled by my refactoring, if this was a more generalized problem, or if I did something wrong during my early refactorings. I did not know. So I had to find the underlying cause of the bug.

But I had two weapons at hand. I had two cases opening the same modal pop-up, one showing the bug an another one not showing it. So I proceeded to dissect them, and compare them. First attempt: I found the **single** place where the pop-up was being open, and I logged the stack at that point, to see if and how executions differ.

thisContext stack traceCr.

Turns out this first attempt gave me enough information to hypothetize about the issue! Let’s see the two stacks, first the one showing the bug (see it starts from LabelMorph >> click:), the second one not showing the bug.

ClyQueryBrowserContext(ClySystemBrowserContext)>>requestSingleMethodTag:suggesting: ClyMethodTagsAndPackageEditorMorph>>requestTag [
		self isExtensionActive 
			ifTrue: [ self requestPackage]
			ifFalse: [ self requestTag ]
	] in ClyMethodTagsAndPackageEditorMorph>>openEditor
BlockClosure>>on:do:
ClyMethodTagsAndPackageEditorMorph>>requestChangeBy:
ClyMethodTagsAndPackageEditorMorph>>openEditor
MorphEventSubscription>>notify:from: [ :s | result := result | ((s notify: anEvent from: sourceMorph) == true) ] in MorphicEventHandler>>notifyMorphsOfEvent:ofType:from: Set>>do: MorphicEventHandler>>notifyMorphsOfEvent:ofType:from:
MorphicEventHandler>>click:fromMorph:
LabelMorph(Morph)>>click:
MouseClickState>>click
MouseClickState>>handleEvent:from:
HandMorph>>handleEvent:

ClyQueryBrowserContext(ClySystemBrowserContext)>>requestSingleMethodTag:suggesting: ClyMethodTagsAndPackageEditorMorph>>requestTag [
		self isExtensionActive 
			ifTrue: [ self requestPackage]
			ifFalse: [ self requestTag ]
	] in ClyMethodTagsAndPackageEditorMorph>>openEditor
BlockClosure>>on:do:
ClyMethodTagsAndPackageEditorMorph>>requestChangeBy:
ClyMethodTagsAndPackageEditorMorph>>openEditor
[target perform: actionSelector withArguments: arguments] in IconicButton(SimpleButtonMorph)>>doButtonAction
BlockClosure>>ensure:
CursorWithMask(Cursor)>>showWhile:
IconicButton(SimpleButtonMorph)>>doButtonAction
IconicButton(SimpleButtonMorph)>>mouseUp:
IconicButton(Morph)>>handleMouseUp:
MouseButtonEvent>>sentTo:
IconicButton(Morph)>>handleEvent:
IconicButton(Morph)>>handleFocusEvent: [
		result := focusHolder handleFocusEvent: 
			(anEvent transformedBy: (focusHolder transformedFrom: self)).
	] in HandMorph>>sendFocusEvent:to:clear:
BlockClosure>>on:do:
WorldMorph>>becomeActiveDuring:
HandMorph>>sendFocusEvent:to:clear:
HandMorph>>sendEvent:focus:clear:
HandMorph>>sendMouseEvent:
HandMorph>>handleEvent:
MouseClickState>>handleEvent:from:
HandMorph>>handleEvent:

Notice that both stacks start from handleEvent:, but one ends up sending a mouseUp: while the other one sends a click: event, and this MouseClickState guy in the middle orchestrating everything with the HandMorph

Some background: how mouse events work

To fix this bug, I had to get my hands dirty and get some details about how mouse events work in Morphic. For those not interested on such details, skip this section, although I think it’s pretty educative ^^.

Morph worlds have hand objects, objects responsible of (among others) dispatching the events to the other morphs in the world.
In each UI cycle, a hand takes the events from a queue of pending OS events, generates Morphic events for the OS event, and dispatches them to the right guy.

Now, the mouse events as they come from the OS have two main flavours: mouseDown and mouseUp. There is no click, nor double click event. They are simulated by the hand.

The process of simulating a click works roughly as follows. When a morph is interested on a click it subscribes to mouse down events. Then, when it receives a mouse down, it has to ask the hand “Hey! I would like clicks!”.
Usually this is done by some code like this:

mouseDown: anEvent
  ...
  anEvent hand waitForClicksOrDrag: h
    event: (anEvent transformedBy: tfm)
    selectors: { nil. nil. nil. #dragTarget:. }
    threshold: 5
  ...

When that happens, the hand changes its state, and stores a MouseClickState object that implements a kind of state machine to detect clicks and double clicks.

Finally, after the mouseDown the morph receives a mouseUp (that should be transformed into a click). The hand, before sending the event to the interested morph, it makes it pass through the state machine which has some code like this:

"Oh, the guy was interested in clicks, and gave me a mouse up following a mouse down in some small enough time interval. Let's click!"
self click.    
aHand handleEvent: firstClickUp.

Where

click will send a click event to the morph and
handleEvent: will send the mouseUp

Fixing the bug

Finding the real fix took more time than the couple of minutes you will spend reading this post. My first fix for it was to invert handleEvent: and click. This fixed the issue in question, but lead to other problems like this one, because now the order of events was altered (and some apps were expecting clicks before mouse ups. I’ll just go to the juicy details.

The problem was caused because during the click event, calypso was opening a modal window. Modal windows do suspend the current execution and do not return the control to the caller until the window is closed. Something like this:

openModal
  self open.
  "Start a sub UI loop that will not return the control to the caller until I'm closed."
  [ self isOpen ] whileTrue: [ self doAnotherUIIteration ].

This meant that the call from click never returned. Thus the mouse up was never dispatched using handleEvent:. And when the sub UI loop starts, the morph with the focus starts receiving new events while the mouse up has never been sent, and the MouseClickState state machine thinks it is still in a click state…

I proposed a solution for this in this pull request. The main idea behind is that the event loop should always dispatch a single event per cycle, like that modal sub-cycles will start with a correct state. So, if like in this case an event (mouse up) generates several events (a click, then a mouseUp), only the first one is dispatched, and the second one is returned to the event queue, for later treatment in a subsequent cycle.

Some conclusion?

Please take care of your bug reports, they are life savers :).

The idea behind comparing executions is super powerful. We have a super smart guy in the team doing a PhD around this topic. This post just shows how to exploit this idea without other tool support than stack access and a log (thisContext stack traceCr).

MouseClickState probably needs more testing and love.

And I hope you had fun ^^

Why can’t Pharo have constructors?

The other day we were talking about Pablo about pre-tenuring objects when discussing about a couple of papers like this one and this other one. These papers describe the idea of pre-tenuring objects for Java. The main idea being to identify allocation sites in the code (e.g., new MyClass(17, true,someObject)), take statistics about the average size and lifetime of the objects created at those allocation sites, and then decide to directly tenure objects (i.e., move them to a region of memory garbage collected less often) at those allocation sites if convenient. Now, garbage collection is actually not the point of this post, but this story serves as a kick-off for it.

So, one of the points of the papers above was to identify allocation sites. This popped a question in my mind:

Hey! How can we identify allocation sites in Pharo?

In Pharo allocations happen on the execution of the method basicNew. The method new calls basicNew to create an instance and then initializes it with the initialize method. Then, in our own classes we create our own ~~constructors~~ creation methods by defining class methods calling new.

Behavior >> basicNew [
  "This is a simplified version for the blog"
  <primitive: 70 error: ec>
  self primitiveFailed
]

Behavior >> new [
  ^ self basicNew initialize
]

Person class >> newInstanceWithName: aName age: anAge [
  self new
    name: aName;
    age: anAge;
    yourself
]

Now, the problem here is that from an execution perspective the real allocation point is the expression self basicNew inside the method new. But that is also the same allocation site for most objects in the entire runtime. If we want to distinguish allocation sites, we need to make them more explicit.

So let’s add constructors

This morning, coffee in hand, baby napping, I decided to give a try at implementing constructors. I wanted to avoid explicitly calling new or basicNew, avoid extra yourself sends (that are always complex to follow for new people). I have seen several people in the past doing similar stuff to avoid having getter/setter methods in their classes. They implemented instance creation methods that received dictionaries for example, used runtime reflection, and/or used long initializeMyObjectWithX:andY:thenZ: messages on the instance side.

I decided I wanted to have syntax sugar on my side :). I wanted to transform my newInstanceWithName: aName age: anAge above into something like this:

Person class >> newInstanceWithName: aName age: anAge [
  name := aName.
  age := anAge
]

And I decided to do it with a compiler plugin.

Compiler Plugins in Pharo

Pharo supports a couple of features that allow us to customize the language in sometimes funny, sometimes useful, and most of the times *very easy* ways. One of those features are compiler plugins.

Compiler plugins in Pharo are objects we can subscribe in the compiler. The compiler will call our plugin in the middle of the compilation, allowing us to insert our own AST (Abstract Syntax Tree) transformations in the compilation chain.

To define our plugin we need to define a subclass of OCCompilerASTPlugin and define the transform method.

OCCompilerASTPlugin subclass: #ConstructorPlugin
	instanceVariableNames: ''
	classVariableNames: ''
	package: 'ConstructorPlugin'

ConstructorPlugin >> transform [
  "We will work in here"
]

Our compiler plugin needs to ensure two things. First, the method we wrote as a constructor needs to be executed on an instance of the class, and not on the class itself. Second, it needs to somehow create the instance and execute that method on our instance. What we want, is that the code we wrote about is automatically (and transparently) translated into:

Person >> newInstanceWithName: aName age: anAge [
  name := aName.
  age := anAge
]

Person class >> newInstanceWithName: aName age: anAge [
  ^ self new newInstanceWithName: aName age: anAge
]

Compiling the code on the instance side

As you see above, we should be able to compile the constructor method directly on the instance side, without much manipulation. In our compiler plugin, we can ask the AST for the class where we are compiling the method, get the instance side, and simply compile it there.

ConstructorPlugin >> transform [
  | classToCompileIn |
  classToCompileIn := ast methodClass.
  classToCompileIn instanceSide compiler compile: ast sourceCode.
]

Note several things on the example above. First, I get to compile a new method by asking the AST its entire source code, this is because we cannot (for now) initiate a compilation from an AST. Second, I’m using the compiler of the class to compile, meaning the method we are compiling will not be installed in the class. This will be useful to hide the generated method as we will see later.

Generating the new real class-side method

Now that we created an instance side method to initialize our instance, we need to create the class side method that will create the instance, and call this method. I’ve decided to not install the instance-side method in the class, to keep generated code hidden (and easy to discard). So what I want to generate is now a method that takes the instance-side method and executes it on a new instance. That is, I want to generate some code like:

Person class >> newInstanceWithName: aName age: anAge [
  ^ self new
    withArgs: { aName . anAge }
    executeMethod: TheInstanceSideMethod
]

Turns out generating code in pharo can be easily done with ASTs too. And since ASTs are our communication with the compiler, we just need to replace the current AST by a new AST reflecting the code we want. We can create an AST for the method above with the following expression, where ast is the original AST:

RBMethodNode
    selector: ast selector
    arguments: ast arguments
    body: (RBSequenceNode statements: { 
        RBReturnNode value: (RBMessageNode
            receiver: (RBMessageNode
                receiver: RBSelfNode new
                selector: #new)
            selector: #'withArgs:executeMethod:'
            arguments: { 
                RBArrayNode statements: ast arguments.
                (RBLiteralValueNode value: hiddenInstanceSideMethod) })
    })

In this expression we create a new method node, with the same selector as before, the same arguments, but with a single statement. This statement is a return node, with a message send: our self new withArgs: { aName . anAge } executeMethod: TheInstanceSideMethod. Note also, that we are not hardcoding the arguments aName and anAge anywhere in the method. We are reusing the arguments coming from the original method, so this transformation will work for any constructors.

Putting it together

Finally, we can put the two things together in a final transform method. In this (almost) final version, I’ve added two extra details to our plugin. So far, if we installed it as-is, this plugin would have tried to compile all possible class-side methods in a class. We don’t want that, we want to scope this transformation to constructors only. In this first iteration, I decided to scope the plugin to work on methods that are marked with the <constructor> pragma and are on class-side.

transform
    | classToCompileIn hiddenInstanceSideMethod |
    classToCompileIn := ast methodClass.
    ((ast hasPragmaNamed: #constructor) not or: [ classToCompileIn isInstanceSide ])
        ifTrue: [ ^ ast ].

    hiddenInstanceSideMethod := classToCompileIn instanceSide compiler compile: ast sourceCode.
    ast := RBMethodNode
        selector: ast selector
        arguments: ast arguments
        body: (RBSequenceNode statements: { 
            RBReturnNode value: (RBMessageNode
                receiver: (RBMessageNode
                    receiver: RBSelfNode new
                    selector: #new)
                selector: #'withArgs:executeMethod:'
                arguments: { 
                    RBArrayNode statements: ast arguments.
                    (RBLiteralValueNode value: hiddenInstanceSideMethod) })
        })

Finally, we can install this plugin to work on our class by redefining the classSideCompiler method.

Person class >> classSideCompiler [
  ^ super classSideCompiler addPlugin: ConstructorPlugin
]

A rough remaining detail

If you’ve followed until here and tried the code above in a method marked as <constructor> you probably have noticed a weird behaviour. Depending on how you defined the class where your constructor is, you may have noticed different things. A pop-up asking you for undeclared variables. Or the fact that accepting that pop-up defines the variable on the class-side. This happens because the compiler performs a semantic analysis on the AST before giving the plugins a chance to do their work. In other words, it is analysing our constructor method on the class side, which does not make much sense for us.

To solve this issue, I extended the compiler plugin mechanism to call also the plugins **before** the semantic analysis. So to make this work properly, I made compiler plugins have two hooks: transformPreSemanticAnalysis and transformPostSemanticAnalysis.

The offending code in the compiler now looks like:

self pluginsDo: [ :each |
    ast := each transformPreSemanticAnalysis: ast ].
self doSemanticAnalysis.
self pluginsDo: [ :each |
    ast := each transformPostSemanticAnalysis: ast ].

What’s next?

Implementing this plugin leaves several open doors for improvement in the compiler infrastructure. I’ve made here a list of potential improvements in the compiler.

I should push my pre/post semantic analysis hooks to the mainstream compiler, after discussion with the compiler guys 🙂
Compiler plugins do only know the AST, but it would be good to get the current compiler too, for example to ask for compilation options. In our case, we had to access the compilation class from the AST, which I find confusing at the least
The compiler should accept ASTs as input too, otherwise going back and forth AST -> source -> AST is a waste of time and error prone.
Should we have a DSL to create ASTs? Maybe lisp-like macros? They could be implemented as a compiler plugin too…

Also, you may have noticed that the syntax-highlighter does not realize that the variables we type are on the instance side, although the code compiles right. I was experimenting with having per-class customisable parsers to solve this issue some time ago. I think this is a nice idea to explore.

What else can we do with compiler plugins? String interpolation, literal objects? Optional arguments? Take your pick.

And what about the pre-tenuring? That’s a story for another day…