Complishon: efficient heuristics for code completion

A couple of weeks ago I started prototyping a new code completion backend. The one available in Pharo 8.0 is good-enough to complete variables and messages to literal objects. However, the accuracy of the completion was pretty low because auto completion candidates were sorted alphabetically, and this was even producing completion pauses for large images. This code completion prototype engine, that I originally baptised as Complishon, was based on two ideas: generators and heuristics.

Generators are objects that provide a stream-like interface to consume values generated sequentially. Better with an example.

stream := Generator on: [ :generator |
  generator yield: 1.
  generator yield: 'hello'.
].
stream next.
>>> 1
stream next.
>>> 'hello'

One cool thing about generators is that the generator code will block until we call next. This is particularly interesting when iterating several big collections: we only really iterate them if we need them. And they come by default in Pharo :).

Heuristics, on the other hand, are not funky Pharo objects that come in a library. Heuristics are objects that represent rules. In our case rules for auto completion. Here you have some example of heuristics:

  • chances are I want to autocomplete first the methods defined in my package
  • chances are that a variable called anOrderedCollection will have effectively an OrderedCollection instance

And so on.

If we encode enough of these little rules on objects, we could have a smart enough auto-completion engine that does not require complex algorithms or machine learning models. Not saying that these may not be good, just saying that I wanted a quick solution to make our life easier.

This post talks about the architecture inside the completion engine. What parts are inside it and how they relate. Turns out this prototype grew up and is making it to the big leagues in Pharo 9.0, where it will be called HeuristicsCompletion.

An architecture for lazy code completion

The heuristics completion engine is done out of three main components:

  • lazy fetchers implemented using generators,
  • a completion object that works as a result-set
  • and several completion heuristics that are chosen from the current AST node.

Fetchers

Fetchers are subclasses of CoFetcher. They need to define the method #entriesDo: aBlock iterating a collection. The fetcher framework will take care of creating a streaming API for it using generators. For example, a simple fetcher can be created and used with:

CoFetcher subclass: #MyFetcher
	instanceVariableNames: ''
	classVariableNames: ''
	package: 'Complishon-Core'
  
MyFetcher >> entriesDo: aBlock [
  1 to: 10000000000 do: aBlock
]

MyFetcher new next >>> 1.

Basic Fetchers

A simple generic fetcher works on a collection.

CoCollectionFetcher onCollection: #( a b b a c )

And an empty fetcher is useful to provide no results

CoEmptyFetcher new

Fetcher combinators

Fetchers can be combined by using simple combinators. A sequence of fetchers, created with the message #,, fetches first the elements in the first fetcher, then in the second one. This also means that the second fetcher will never be executed until the first one is completely consumed, which is an interesting property if it is expensive to calculate.

CoInstanceVariableFetcher new, CoGlobalVariableFetcher new

filter fetcher, created with the message #select:, decorates another fetcher and filters it with a block.

CoInstanceVariableFetcher new select: [ :e | "a condition..." ]

no-repetition fetcher, created with the message #withoutRepetition, decorates another fetcher and filters those elements that were already returned previously by himself.

aFetcher withoutRepetition

The Juice: Fetchers for code completion

In addition, the engine provides fetchers to iterate the following:

  • CoInstanceVariableFetcher: instance variables of a class
  • CoClassVariableFetcher: class variables of a class
  • CoClassImplementedMessagesFetcher: messages implemented by a class
  • CoGlobalVariableFetcher: global variables in an environment
  • CoMethodVariableFetcher: variables accessible to an AST node (in order)
  • CoPackageImplementedMessagesFetcher: messages sent in a package

For code completion, another combinator shows useful to iterate a fetcher up a class hierarchy: CoHierarchyFetcher. This hierarchy fetcher decorates a ClassBasedComplishonFetcher (i.e., a CoClassImplementedMessagesFetcher, a CoInstanceVariableFetcher or a CoClassImplementedMessagesFetcher), and can be created with the message #forHierarchy.

Completion ResultSet

The CoResultSet object stores a fetcher and lazily fetches and internally store objects on demand:

c := CoResultSet fetcher: (CoInstanceVariableFetcher new
    completionClass: aClass;
    yourself).

"The first time it will fetch 20 elements from the fetcher and store them"
c first: 20.

"The second time it will just return the already stored elements"
c first: 20.

"Now it will fetch some more objects to have its 25"
c first: 25.

The CoResultSet object allows one to:

  • explicit fetch using fetch: and fetchAll
  • retrieve the results using firstfirst: and results
  • filter those results using filterByString:
  • query it with hasMoreElements and notEmpty

Heuristics

When the autocompletion is invoked, it calculates how to autocomplete given the AST node where the caret is in the text editor. The following piece of code shows examples of heuristics implemented in the prototype.

Heuristics for messages

If the AST node is a message whose receiver is self, autocomplete all messages implemented in the hierarchy.

(CoClassImplementedMessagesFetcher new
    completionClass: aClass;
    forHierarchy) withoutRepetition

If the AST node is a message whose receiver is super, autocomplete all messages implemented in the hierarchy starting from the superclass.

(CoClassImplementedMessagesFetcher new
    completionClass: aClass superclass;
    forHierarchy) withoutRepetition

If the AST node is a message whose receiver is a class name like OrderedCollection, autocomplete all messages in the class-side hierarchy of that class.

(ClassImplementedMessagesComplishonFetcher new
    completionClass: (completionContext environmentAt: aRBMessageNode receiver name) classSide;
    forHierarchy) withoutRepetition

If the AST node is a message whose receiver is a variable that has type information in its name name, like anOrderedCollection, autocomplete all messages in the instance-side hierarchy of guessed class. Then continue with normal completion. There are two cases: variables starting with a such as aPoint and variables starting with an such as anASTCache.

completionContext
  environmentAt: aRBMessageNode receiver name allButFirst asSymbol
  ifPresent: [ :class |
    ^ (ClassImplementedMessagesComplishonFetcher new
        completionClass: class;
        forHierarchy), PackageImplementedMessagesComplishonFetcher new  ].

completionContext
  environmentAt: (aRBMessageNode receiver name allButFirst: 2) asSymbol
  ifPresent: [ :class |
    ^ (ClassImplementedMessagesComplishonFetcher new
        completionClass: class;
        forHierarchy), PackageImplementedMessagesComplishonFetcher new  ]

If all the above fail, autocomplete all messages used in the current package. Chances are we are going to send them again.

 ^ CoPackageImplementedMessagesFetcher new
    	complishonPackage: aPackage;
	yourself

Heuristics for Method signature

When autocompleting the name of a new method, chances are we want to override a method in the hierarchy, or to reimplement a method polymorphic with the ones existing in the current package.

self newSuperMessageInHierarchyFetcher,
    (CoPackageImplementedMessagesFetcher new
        complishonPackage: aPackage;
	yourself)
            withoutRepetition

Heuristics for variables

Variables accessed by an instance are first the ones in the method, then the ones in the hierarchy.

instanceAccessible := (CoMethodVariableFetcher new
        complishonASTNode: anAST;
	yourself),
            (CoInstanceVariableFetcher new
                completionClass: aClass)
			forHierarchy ].

Then, variables accessed by an instance are also the ones in the class variables and globals.

globallyAccessible := (CoClassVariableFetcher new
    completionClass: complishonContext complishonClass)
        forHierarchy,
            (CoGlobalVariableFetcher new
	        complishonEnvironment: anEnvironment;
		yourself).

What’s next?

Try it in the lastest Pharo 9.0 dev build!

We would like to have new ideas for heuristics.

For example, I’ve started integrating it with a fast type inferencer that looks at initialize methods, or with a fast type inferencer like roel typer which latest version for Pharo can be found on github.

Why not machine learning?

Statistical models?

Published by Guille Polito

Pharo dev. Researcher, engineer and father. > If it ain't tested, it does not exist.

Leave a comment