Nicolas Martyanoff – Brain dump

Interactive Common Lisp development

Sun, 19 Nov 2023 18:00:00 +0000

Common Lisp programming is often presented as “interactive”. In most languages, modifications to your program are applied by recompiling it and restarting it. In contrast, Common Lisp lets you incrementally modify your program while it is running.

While this approach is convenient, especially for exploratory programming, it also means that the state of your program during execution does not always reflect the source code. You do not just define new constructs: you look them up, inspect them, modify them or delete them. I had to learn a lot of subtleties the hard way. This article is a compendium of information related to the interactive nature of Common Lisp.

Variables

In Common Lisp variables are identified by symbols. Evaluating (SETQ A 42) creates or updates a variable with the integer 42 as value, and associates it to the A symbol. After the call to SETQ, (BOUNDP 'A) will return T and (SYMBOL-VALUE 'A) will return 42.

You do not delete a variable: instead, you remove the association between the symbol and the variable. You do so with MAKUNBOUND. Following the previous example, (MAKUNBOUND 'A) will remove the association between the A symbol and the variable. And (BOUNDP 'A) returns NIL as expected. As for (SYMBOL-VALUE 'A), it now signals an UNBOUND-VARIABLE error as mandated by the standard.

What about DEFVAR and DEFPARAMETER? They are also used to declare variables (globally defined ones), associating them with symbols. Both define “special” variables (i.e. variables for which all bindings are dynamic; see CLtL2 9.2). The difference is that the initial value passed to DEFVAR is not evaluated if it already has a value. MAKUNBOUND will work on variables declared with DEFVAR or DEFPARAMETER as expected.

DEFCONSTANT is a bit more complicated. CLtL2¹ 5.3.2 states that “once a name has been declared by defconstant to be constant, any further assignment to or binding of that special variable is an error”, but does not clearly define whether MAKUNBOUND should or should not be able to be used on constants. However, CLtL2 5.3.2 also states that “defconstant […] does assert that the value of the variable name is fixed and does license the compiler to build assumptions about the value into programs being compiled”. If the compiler is allowed to rely on the value associated with the variable name, it would make sense not to allow the deletion of the binding. Thus it is recommended to only use constants for values that are guaranteed to never change, e.g. mathematical constants. Most of the time you want DEFPARAMETER.

Note that MAKUNBOUND does not apply to lexical variables.

Functions

Common Lisp is a Lisp-2, meaning that variables and functions are part of two separate namespaces. Despite this clear separation, functions behave similarly to variables.

Using DEFUN will either create or update the global function associated with a symbol. SYMBOL-FUNCTION returns the globally defined function associated with a symbol, and FMAKUNBOUND deletes this association.

Let us point out a common mistake when referencing functions: (QUOTE F) (abbreviated as 'F) yields a symbol while (FUNCTION F) (abbreviated as #'F) yields a function. The function argument of FUNCALL and APPLY can be either a symbol or a function (see CLtL2 7.3) It has two consequences:

First, one can write a function referencing F as (QUOTE F) with the expectation that F will later be bound to a function. The following function definition is perfectly valid even though F has not been defined yet:

(defun foo (a b)
  (funcall 'f a b))

Second, redefining the F function will update its association (or binding) to the F symbol, but the previous function will still be available if it has been referenced somewhere before the update. For example:

(setf (symbol-function 'foo) #'1+)
(let ((old-foo #'foo))
  (setf (symbol-function 'foo) #'1-)
  (funcall old-foo 42))

What about macros? Since macros are a specific kind of functions (CLtL2 5.1.4 “a macro is essentially a function from forms to forms”), it is not surprising that they share the same namespace and can be manipulated in the same way as functions with FBOUNDP, SYMBOL-FUNCTION and FMAKUNBOUND.

Symbols and packages

While functions and variables are familiar concepts to developers, Common Lisp symbols and packages are a bit more peculiar.

A symbol is interned when it is part of a package. The most explicit way to create an interned symbol is to use INTERN, e.g. (INTERN "FOO"). INTERN interns the symbol in the current package by default, but one can pass a package as second argument. After that, (FIND-SYMBOL "FOO") will return our interned symbol as expected.

More surprising, the reader automatically interns symbols. You can test it by evaluating (READ-FROM-STRING "BAR"). After evaluation, BAR is a symbol interned in the current package. This also means that it is very easy to pollute a package with symbols in ways you did not necessarily expect. To clean up, simply use UNINTERN. Remember to refer to the right symbol: to remove the symbol BAR from the package FOO, use (UNINTERN 'FOO::BAR "BAR").

A symbol is either internal or external. EXPORT will make a symbol external to its package while UNEXPORT will make it internal. As for UNINTERN, confusion usually arises around which symbol is affected. (UNEXPORT 'FOO:BAR "FOO") correctly refers to the external symbol in the FOO package and makes it internal again. (UNEXPORT 'BAR "FOO") will signal an error since the BAR symbol is not part of the FOO package (unless of course the current package happens to be FOO).

Packages themselves can be created with MAKE-PACKAGE and destroyed with DELETE-PACKAGE. Developers are usually more familiar with DEFPACKAGE, a macro allowing the creation of a package and its configuration (package use list, imported and exported symbols, etc.) in a declarative way. A surprising and frustrating behavior is that evaluating a DEFPACKAGE form for a package that already exists will result in undefined behavior if the new declaration “is not consistent” (CLtL2 11.7) with the current state of the package. As an example, adding symbols to the export list is perfectly fine. Removing one will result in undefined behavior (usually an error) due to the inconsistency of the export list. Fortunately, Common Lisp offers all the necessary functions to manipulate packages and their symbols: use them!

Classes

The Common Lisp standard includes CLOS, the Common Lisp Object System. Unsurprisingly it provides multiple ways to interact with classes and objects dynamically.

As variables or functions, classes are identified by symbols and FIND-CLASS returns the class associated with a symbol. Class names are part of a separate namespace shared with structures and types.

The DEFCLASS macro is the only way to define or redefine a class. Redefining a class means that instances created afterward with MAKE-INSTANCE will use the new definition. Existing instances are updated: newly added slots are added (either unbound or using the value associated with :INITFORM) and slots that are not defined anymore are deleted. UPDATE-INSTANCE-FOR-REDEFINED-CLASS is particularly interesting: developers can define methods for this generic function in order to control how instances are updated when their class is redefined.

Defining classes may imply implicitly defining methods: the :ACCESSOR, :READER and :WRITER slot keyword arguments will lead to the creation of generic functions. When a class is redefined, methods associated with slots that have been removed will live on.

A limitation of CLOS is that classes cannot be deleted. FIND-CLASS can be used as a place, and (SETF (FIND-CLASS 'FOO) NIL) will remove the association between the FOO symbol and the class, but the class itself and its instances will not disappear. While this limitation may seem strange, ask yourself how an implementation should handle instances of a class that has been deleted.

The class of an instance can be changed with CHANGE-CLASS: slots that exist in the new class will be conserved while those that do not are deleted. New slots are either unbound or set to the value associated with :INITFORM in the new class. In a way similar to UPDATE-INSTANCE-FOR-REDEFINED-CLASS, UPDATE-INSTANCE-FOR-DIFFERENT-CLASS lets developers control precisely the process.

Generics and methods

Generics are functions which can be specialized based on the class (and not type as one could expect) of their arguments and which can have a method combination type.

Generics can be created explicitly with DEFGENERIC or implicitly when DEFMETHOD is called and the list of parameter specializers and method combination does not match any existing generic function. Since generics are functions, FBOUNDP, SYMBOL-FUNCTION and FMAKUNBOUND will work as expected.

Methods themselves are either defined as part of the DEFGENERIC call or separately with DEFMETHOD. Discovering the different methods associated with a generic function is a bit more complicated. There is no standard way to list the methods associated with a generic, but it is at least possible to look up a method with FIND-METHOD. Do remember to pass a function (and not a symbol) as the generic, and to pass classes (and not symbols naming classes) in the list of specializers.

Redefinition is not as obvious as for non-generic functions. When redefining a generic with DEFGENERIC all methods defined as part of the previous DEFGENERIC form are removed and methods defined in the redefinition are added. However, methods defined separately with DEFMETHOD are not affected.

For example, in the following code, the second call to DEFGENERIC will replace the two methods specialized on INTEGER and FLOAT respectively by a single one specialized on a STREAM, but the method specialized on STRING will remain unaffected.

(defgeneric foo (a)
  (:method ((a integer))
    (format nil "~A is an integer" a))
  (:method ((a float))
    (format nil "~A is a float" a)))

(defmethod foo ((a string))
  (format nil "~S is a string" a))

(defgeneric foo (a)
  (:method ((a stream))
    (format nil "~A is a stream" a)))

Note that trying to redefine a generic with a different parameter lambda list will cause the removal of all previously defined methods since none of them can match the new parameters.

Removing a method will require you to find it first using FIND-METHOD and then use REMOVE-METHOD. With the previous example, removing the method specialized on a STRING argument is done with:

(remove-method #'foo (find-method #'foo nil (list (find-class 'string)) nil))

Working with methods is not always easy, and two errors are very common.

First, remember that changing the combinator in a DEFMETHOD will define a new method. If you realize that your :AFTER method should use :AROUND and reevaluate the DEFMETHOD form, remember to delete the method with the :AFTER combinator or you will end up with two methods being called.

Second, when defining a method for a generic from another package, remember to correctly refer to the generic. If you want to define a method on the BAR generic from package FOO, use (DEFMETHOD FOO:BAR (…) …) and not (DEFMETHOD BAR (…) …). In the latter case, you will define a new BAR generic in the current package.

Meta Object Protocol

While CLOS is already quite powerful, various interactions are impossible. One cannot create classes or methods programmatically, introspect classes or instances for example to list their slots or obtain all their superclasses, or list all the methods associated with a generic function.

In addition of an example of a CLOS implementation, The Art of the Metaobject Protocol² defines multiple extensions to CLOS including metaclasses, metaobjects, dynamic class and generic creation, class introspection and much more.

Most Common Lisp implementations implement at least part of these extensions, usually abbreviated as “MOP”, for “MetaObject Protocol”. The well-known closer-mop system can be used as a compatibility layer for multiple implementations.

Structures

Structures are record constructs defined with DEFSTRUCT. At a glance they may seem very similar to classes, but they have a fundamental limitation: the results of redefining a structure are undefined (CLtL2 19.2).

While this property allows implementations to handle structures in a more efficient way than classes, it makes structures unsuitable for incremental development. As such, they should only be used as a last resort, when a regular class has been proved to be a performance bottleneck.

Conditions

While conditions look very similar to classes the Common Lisp standard does not define them as classes. This is one of the few differences between the standard and CLtL2 which clearly states in 29.3.4 that “Common Lisp condition types are in fact CLOS classes, and condition objects are ordinary CLOS objects”.

This is why one uses DEFINE-CONDITION instead of DEFCLASS and MAKE-CONDITION instead of MAKE-INSTANCE. This also means that one should not use slot-related functions (including the very useful WITH-SLOTS macro) with conditions.

In practice, most modern implementations follow CLtL2 and the CLOS-CONDITIONS:INTEGRATE X3J13 Cleanup Issue and implement conditions as CLOS classes, meaning that conditions can be manipulated and redefined as any other classes. And the same way as any other classes, they cannot be deleted.

Types

Types are identified by symbols and are part of the same namespace as classes (which should not be surprising since defining a class automatically defines a type with the same name).

Types are defined with DEFTYPE, but documentation is surprisingly silent on the effects of type redefinition. This can lead to interesting situations. On some implementations (e.g. SBCL and CCL), if a class slot is defined as having the type FOO, redefining FOO will not be taken into account and the type checking operation (which is not mandated by the standard) will use the previous definition of the type. Infortunatly Common Lisp does not mandate any specific behavior on slot type mismatches (CLtL2 28.1.3.2).

Thus developers should not expect any useful effect from redefining types. Restarting the implementation after substantial type changes is probably best.

In the same vein interactions with types are very limited. You cannot find a type by its symbol or even check whether a type exists or not. Calling TYPE-OF on a value will return a type this value satisfies, but the nature of the type is implementation-dependent (CLtL2 4.9): it could be any supertype. In other words, TYPE-OF could absolutly return T for all values but NIL. At least SUBTYPE-P lets you check whether a type is a subtype of another type.

Going further

Common Lisp is a complex language with a lot of subtleties, way more than what can be covered in a blog post. The curious reader will probably skip the standard (not because you have to buy it, but because it is a [email protected]">low quality scan of a printed document and jump directly to CLtL2 or the Common Lisp HyperSpec. The Art of the Metaobject Protocol is of course the normative reference for the CLOS extensions usually referred to as “MOP”.

Guy L. Steele Jr. Common Lisp the Language, 2nd edition. 1990. ↩︎
Gregor Kiczales, Jim des Rivieres and Daniel G. Bobrow. The Art of the Metaobject Protocol. 1991. ↩︎

Reduce vs fold in Common Lisp

Fri, 02 Jun 2023 18:00:00 +0000

Introduction

If you have already used functional languages, you are probably familiar with fold, a high order function used to iterate on a collection of values to combine them and return a result. You may be surprised that Common Lisp does not have a fold function, but provides REDUCE which works a bit differently. Let us see how they differ.

Understanding `REDUCE`

In its simplest form, REDUCE accepts a function and a sequence (meaning either a list or a vector). It then applies the function to successive pairs of sequence elements.

You can easily check what happens by tracing the function:

CL-USER> (trace +)
CL-USER> (reduce #'+ '(1 2 3 4 5))
  0: (+ 1 2)
  0: + returned 3
  0: (+ 3 3)
  0: + returned 6
  0: (+ 6 4)
  0: + returned 10
  0: (+ 10 5)
  0: + returned 15
15

In this example, the call to REDUCE evaluates (+ (+ (+ (+ 1 2) 3) 4) 5).

You can reverse the order using the :from-end keyword argument:

CL-USER> (trace +)
CL-USER> (reduce #'+ '(1 2 3 4 5) :from-end t)
  0: (+ 4 5)
  0: + returned 9
  0: (+ 3 9)
  0: + returned 12
  0: (+ 2 12)
  0: + returned 14
  0: (+ 1 14)
  0: + returned 15
15

In which case you will evaluate (+ 1 (+ 2 (+ 3 (+ 4 5)))). The result is of course the same since the + function is associative.

You can of course provide an initial value, in which case REDUCE will behave as if this value has been present at the beginning (or the end with :from-end) of the sequence.

The surprising aspect of REDUCE is its behaviour when called on a sequence with less than two elements:

If the sequence contains a single element:
- if there is no initial value, the function is not called and the element is returned directly;
- if there is one, the function is called on both the initial value and the single element.
If the sequence is empty:
- if there is no initial value, the function is called without any argument;
- if there is one, the function is not called and the initial value is returned directly.

As a result, any function passed to REDUCE must be able to handle being called with zero, one or two arguments. Most examples found on the Internet use + or append, and these functions happen to handle it (e.g. (+) returns the identity element of the addition, zero). If you write your own functions, you will have to deal with it using the &OPTIONAL lambda list keyword.

This can lead to unexpected behaviours. If you compute the sum of a sequence of floats using (reduce #'+ floats), you may find it logical to obtain a float. But if FLOATS is an empty sequence, you will get 0 which is not a float. Something to keep in mind.

Differences with fold

The fold function is traditionally defined as accepting three arguments: a function, an initial value — or accumulator — and a list. The function is called repeatedly with both the accumulator and a list element, using the value returned by the function as next accumulator.

For example in Erlang:

lists:foldl(fun(X, Sum) -> Sum + X end, 0, [1, 2, 3, 4, 5]).

An interesting consequence is that fold functions are always called with the same type of arguments (the list value and the accumulator), while REDUCE functions can be called with zero or two list values. This makes it harder to write functions when the accumulated value has a different type from sequence values.

Fold is also simpler than REDUCE since it does not have any special case, making it easier to reason about its behaviour.

It would be interesting to know why a function as fundamental as fold was not included in the Common Lisp standard.

Implementing `FOLDL`

We can of course implement a fold function in Common Lisp. We will concentrate on the most common (and most efficient) left-to-right version. Let us start by a simple implementation for lists:

(defun foldl/list (function value list)
  (declare (type (or function symbol) function)
           (type list list))
  (if list
      (foldl/list function (funcall function value (car list)) (cdr list))
      value))

As clearly visible, the recursive call to FOLDL/LIST is in tail position and SBCL will happily perform tail-call elimination.

For vectors we use an iterative approach:

(defun foldl/vector (function value vector)
  (declare (type (or function symbol) function)
           (type vector vector))
  (do ((i 0 (1+ i))
       (accumulator value))
      ((>= i (length vector))
       accumulator)
    (setf accumulator (funcall function accumulator (aref vector i)))))

Finally we write the main FOLDL function which operates on any sequence:

(defun foldl (function value sequence)
  (declare (type (or function symbol) function)
           (type sequence sequence))
  (etypecase sequence
    (list (foldl/list function value sequence))
    (vector (foldl/vector function value sequence))))

At the point we can already use FOLDL for various operations. We could of course improve it with the addition of the usual :START, :END and :KEY keyword arguments for more flexibility.

Counting lines with Common Lisp

Fri, 17 Mar 2023 18:00:00 +0000

A good line counting program has two features: it only counts non-empty lines to get a fair estimate of the size of a project, and it groups line counts by file type to help see immediately which languages are used.

A long time ago I got frustrated with two well known line counters. Sloccount spits out multiple strange Perl warnings about locales, and most of the output is a copyright notice and some absurd cost estimations. Cloc has fourteen Perl packages as dependencies. Writing a simple line counter is an interesting exercise; at the time I was discovering Common Lisp, so I wrote my own version.

I made a few changes years after years, but most of the code stayed the same. I thought it would be interesting to revisit this program and present it part by part as a demonstration of how you can use Common Lisp to solve a simple problem.

We are going to write the program bottom-up, starting with the smallest building blocks and progressively building upon them.

The program

The program is written in Common Lisp. The most convenient way of storing and executing it is a single executable file stored in a directory being part of the PATH environment variable. In my case, the script will be called locc, for “line of code counter”, and will be stored in the ~/bin directory.

We start the file with a shebang indicating how to execute the file. We use the SBCL implementation because it is stable and actively developed. It also makes it easy to execute a simple file:

#!/usr/bin/sbcl --script

Finding files

Our line counter will operate on directories, so it has to be able to list files in them. Path handling functions are very disconcerting at first. Common Lisp was designed a long time ago, and operating systems were different at the time. Let us dig in!

First let us write a simple function to check if a pathname object is a directory pathname:

(defun directory-path-p (path)
  "Return T if PATH is a directory or NIL else."
  (declare (type (or pathname string) path))
  (and (not (pathname-name path))
       (not (pathname-type path))))

Then we write a function to identify hidden files and directories since we do not want to include them:

(defun hidden-path-p (path)
  "Return T if PATH is a hidden file or directory or NIL else."
  (declare (type pathname path))
  (let ((name (if (directory-path-p path)
                  (car (last (pathname-directory path)))
                  (file-namestring path))))
    (and (plusp (length name))
         (eq (char name 0) #\.))))

As you can see we use DIRECTORY-PATH-P to extract the basename of the path, then check if it starts with a full stop (only if it is not empty of course).

Finally we can write the function to actually list files in a directory recursively:

(defun directory-path (path)
  "If PATH is a directory pathname, return it as it is. If it is a file
pathname or a string, transform it into a directory pathname."
  (declare (type (or pathname string) path))
  (if (directory-path-p path)
      path
      (make-pathname :directory (append (or (pathname-directory path)
                                            (list :relative))
                                        (list (file-namestring path)))
                     :name nil :type nil :defaults path)))

(defun find-files (path)
  "Return a list of all files contained in the directory at PATH or any of its
subdirectories."
  (declare (type (or pathname string) path))
  (flet ((list-directory (path)
           (directory
            (make-pathname :defaults (directory-path path)
                           :type :wild :name :wild))))
    (let ((paths nil)
          (children (list-directory (directory-path path))))
      (dolist (child children paths)
        (unless (hidden-path-p child)
          (if (directory-path-p child)
              (setf paths (append paths (find-files child)))
              (push child paths)))))))

We use the DIRECTORY standard function with a path containing a wildcard component to list the files in a directory, and do so recursively.

Counting lines

Now that we have files, we can start counting lines. Let us first write a function to count the number of non-empty lines in a file.

(defun count-file-lines (path)
  "Count the number of non-empty lines in the file at PATH. A line is empty if
it only contains space or tabulation characters."
  (declare (type pathname path))
  (with-open-file (stream path :element-type '(unsigned-byte 8))
    (do ((nb-lines 0)
         (blank-line t))
        (nil)
      (let ((octet (read-byte stream nil)))
        (cond
          ((or (null octet) (eq octet #.(char-code #\Newline)))
           (unless blank-line
             (incf nb-lines))
           (when (null octet)
             (return-from count-file-lines nb-lines))
           (setf blank-line t))
          ((and (/= octet #.(char-code #\Space))
                (/= octet #.(char-code #\Tab)))
           (setf blank-line nil)))))))

We open the file to obtain a steam of octets, and read it octet by octet, keeping track of whether the current line is blank or not. Note how we make sure to count the last line even if it does not end with a newline character.

Reading a file one octet could be a disaster for performances. Fortunately SBCL file streams are buffered, something we can easily check by running our program with strace -e trace=openat,read. We would not rely on this property if we wanted our program to work on multiple Common Lisp implementations, but this is a non issue here.

Identifying the file type

Counting lines is one thing, but we need to identify their content. The simplest way is to do so based on the file extension.

Obviously we will want to ignore various files which are known not to contain text content, so we start by building a hash table containing these extensions:

(defparameter *ignored-extensions*
  (let ((extensions '("a" "bin" "bmp" "cab" "db" "elc" "exe" "gif" "gz"
                      "jar" "jpeg" "jpg" "o" "pcap" "pdf" "png" "ps" "rar"
                      "svg" "tar" "tgz" "tiff" "zip"))
        (table (make-hash-table :test 'equal)))
    (dolist (extension extensions table)
      (setf (gethash extension table) t)))
  "A hash table containing all file extensions to ignore.")

We then create another hash table to associate a type symbol to each known file extension:

(defparameter *extension-types*
  (let ((pairs '(("asm" . assembly) ("s" . assembly)
                 ("adoc" . asciidoc)
                 ("awk" . awk)
                 ("h" . c) ("c" . c)
                 ("hpp" . cpp) ("cpp" . cpp) ("cc" . cpp)
                 ("css" . css)
                 ("el" . elisp)
                 ("erl" . erlang)
                 ("go" . go)
                 ("html" . html) ("htm" . html)
                 ("ini" . ini)
                 ("hs" . haskell)
                 ("java" . java)
                 ("js" . javascript)
                 ("json" . json)
                 ("tex" . latex)
                 ("lex" . lex)
                 ("lisp" . lisp)
                 ("mkd" . markdown) ("md" . markdown)
                 ("rb" . ruby)
                 ("pl" . perl) ("pm" . perl)
                 ("php" . php)
                 ("py" . python)
                 ("sed" . sed)
                 ("sh" . shell) ("bash" . shell) ("csh" . shell)
                 ("zsh" . shell) ("ksh" . shell)
                 ("scm" . scheme)
                 ("sgml" . sgml)
                 ("sql" . sql)
                 ("texi" . texinfo)
                 ("texinfo" . texinfo)
                 ("vim" . vim)
                 ("xml" . xml) ("dtd" . xml) ("xsd" . xml)
                 ("yaml" . yaml) ("yml" . yaml)
                 ("y" . yacc)))
        (table (make-hash-table :test 'equal)))
    (dolist (pair pairs table)
      (setf (gethash (car pair) table) (cdr pair))))
  "A hash table containing a symbol identifying the type of a file for
  each known file extension.")

With these hash tables, the function identifying the type of a file is trivial:

(defun identify-file-type (path)
  "Return a symbol identifying the type of the file at PATH, or UNKNOWN if the
file extension is not known."
  (declare (type pathname path))
  (let ((extension (pathname-type path)))
    (unless (gethash extension *ignored-extensions*)
      (gethash extension *extension-types* 'unknown))))

Collecting file information

Up to this point, we wrote several functions without connecting them. But we now have all the building blocks we need. Let us use them to accumulate information about the files in a list of directories.

(defun collect-line-counts (directory-paths)
  "Collect the line count of all files in the directories located at one of the
paths in DIRECTORY-PATHS and return them grouped by file type as an
association list."
  (declare (type list directory-paths))
  (let ((line-counts (make-hash-table)))
    (dolist (directory-path directory-paths)
      (dolist (path (find-files directory-path))
        (handler-case
            (let ((type (identify-file-type path)))
              (when (and type (not (eq type 'unknown)))
                (let ((nb-lines (count-file-lines path)))
                  (incf (gethash type line-counts 0) nb-lines))))
          (error (condition)
            (format *error-output* "~&error while reading ~A: ~A~%"
                    path condition)))))
    (let ((line-count-list nil))
      (maphash (lambda (type nb-lines)
                 (push (cons type nb-lines) line-count-list))
               line-counts)
      line-count-list)))

We get to use all previous functions. We iterate through directory paths to find non-hidden files using FIND-FILES, then we use IDENTIFY-FILE-TYPE to obtain a type symbol and COUNT-FILE-LINES to count the number of non-empty lines in each file. Results are accumulated by file type in the LINE-COUNTS hash table. During this process, we handle errors that may occur while reading files with a message on the error output. Finally we transform the hash table into an association list and return it.

Presenting results

Executing all these functions in the Lisp REPL is quite practical during developement, but the default pretty printer is not really what you expect for the final command line tool:

So let us write a function to format this association list:

(defun format-line-counts (line-counts &key (stream *standard-output*))
  "Format the line counts in the LINE-COUNTS association list to STREAM."
  (declare (type list line-counts)
           (type stream stream))
  (dolist (entry (sort line-counts '> :key 'cdr))
    (let ((type (car entry))
          (nb-lines (cdr entry)))
      (format stream "~12A  ~8@A~%"
              (string-downcase (symbol-name type)) nb-lines))))

We print the list sorted in descending order, meaning that the file type with the most number of lines comes first. Of course the output is padded to make sure numbers are all aligned.

Finalizing the program

The only thing left to do is the entry point of the script. This is the only place where we need to call a non-standard function in order to access command line arguments. If no command line argument were passed to the script, we look for files in the current directory.

(let ((paths (or (cdr sb-ext:*posix-argv*) '("."))))
  (format-line-counts
   (collect-line-counts paths)))

Much better!

There does not seem to be any performance issue: on the Linux kernel source tree, this program is almost 11 times faster than sloccount. It would be interesting to profile the program to make sure IO is the bottleneck and improve inefficient parts, but this code is fast enough for my needs.

As you can see, it is not that hard to use Common Lisp to solve problems.

Custom Font Lock configuration in Emacs

Fri, 24 Feb 2023 18:00:00 +0000

Font Lock is the builtin Emacs minor mode used to highlight textual elements in buffers. Major modes usually configure it to detect various syntaxic constructions and attach faces to them.

The reason I ended up deep into Font Lock is because I was not satisfied with the way it is configured for lisp-mode, the major mode used for both Common Lisp and Emacs Lisp code. This forced me to get acquainted with various aspects of Font Lock in order to change its configuration. If you want to change highlighting for your favourite major mode, you will find this article useful.

Common Lisp highlighting done wrong

The core issue of Common Lisp highlighting in Emacs is that a lot of it is arbitrary and inconsistent:

The mode highlights what it calls “definers” and “keywords”, but it does not really make sense in Common Lisp. Why would WITH-OUTPUT-TO-STRING be listed as a keyword, but not CLASS-OF?
SIGNAL uses font-lock-warning-face. Why would it be a warning? Even stranger, why would you use this warning face for CHECK-TYPE?
Keywords and uninterned symbols are all highlighted with font-lock-builtin-face. But they are not functions or variables. They are not even special in any way, and their syntax already indicates clearly their nature. Having so many yellow symbols everywhere is really distracting.
All symbols starting with & are highlighted using font-lock-type-face. But lambda list arguments are not types, and symbols starting with & are not always lambda list arguments.
All symbols preceded by ( whose name starts with DO- or WITH- are highlighted as keywords. There is even a comment by RMS stating that it is too general. He is right.

Beyond these issues, the mode sadly uses default Font Lock faces instead of defining semantically appropriate faces and mapping them to existing ones as default values.

The chances of successfully driving this kind of large and disruptive change directly into Emacs are incredibly low. Even if it was to be accepted, the result would not be available until the next release, which could mean months. Fortunately, Emacs is incredibly flexible and we can change all of this ourselves.

Note that you may not agree with the list of issues above, and this is fine. The point of this article is to show you how you can change the way Emacs highlights content in order to match your preferences. And you can do that for all major modes!

Font Lock configuration

Font Lock always felt a bit magic and it took me some time to find the motivation to read the documentation. As is turned out, it can be used for very complex highlighting schemes, but basic features are not that hard to use.

The main configuration of Font Lock is stored in the font-lock-defaults buffer-local variable. It is a simple list containing the following entries:

A list of symbols containing the value to use for font-lock-keywords at each level, the first symbol being the default value.
The value used for font-lock-keywords-only. If it is nil, it enables syntaxic highlighting (strings and comments) in addition of search-based (keywords) highlighting.
The value used for font-lock-keywords-case-fold-search. If true, highlighting is case insensitive.
The value used for font-lock-syntax-table, the association list controlling syntaxic highlighting. If it is nil, Font Lock uses the syntax table configured with set-syntax-table. In lisp-mode this would mean lisp-mode-syntax-table.
All remaining values are bindings using the form (VARIABLE-NAME . VALUE) used to set buffer-local values for other Font Lock variables.

The part we are interested about is search-based highlighting which uses regular expressions to find specific text fragments and attach faces to them.

Values used for font-lock-keywords are also lists. Each element is a construct used to specify one or more keywords to highlight. While these constructs can have multiple forms for more complex use cases, we will only use the two simplest ones:

(REGEXP . FACE) tells Font Lock to use FACE for text fragments which match REGEXP. For example, you could use ("\\_<-?[0-9]+\\_>" . font-lock-constant-face) to highlight integers as constants (note the use of \_< and \_> to match the start and end of a symbol; see the regexp documentation for more information).
(REGEXP (GROUP FACE)…) is a bit more advanced. When REGEXP matches a subset of the buffer, Font Lock assigns faces to the capture group identified by their number. You could use this construction to detect a complex syntaxic element and highlight some of its parts with different faces.

Simplified Common Lisp highlighting

We are going to configure keyword highlighting for the following types of values:

Character literals, e.g. #\Space.
Function names in the context of a function call for standard Common lisp functions.
Standard Common Lisp values such as *STANDARD-OUTPUT* or PI.

Additionally, we want to keep the default syntaxic highlighting configuration which recognizes character strings, documentation strings and comments.

Faces

Let us start by defining new faces for the different values we are going to match:

(defface g-cl-character-face
  '((default :inherit font-lock-constant-face))
  "The face used to highlight Common Lisp character literals.")

(defface g-cl-standard-function-face
  '((default :inherit font-lock-keyword-face))
  "The face used to highlight standard Common Lisp function symbols.")

(defface g-cl-standard-value-face
  '((default :inherit font-lock-variable-name-face))
  "The face used to highlight standard Common Lisp value symbols.")

Nothing complicated here, we simply inherit from default Font Lock faces. You can then configure these faces in your color theme without affecting other modes using Font Lock.

Keywords

To detect standard Common Lisp functions and values, we are going to need a regular expression. The first step is to build a list of strings for both functions and values. Easy to do with a bit of Common Lisp code!

(defun standard-symbol-names (predicate)
  (let ((symbols nil))
    (do-external-symbols (symbol :common-lisp)
      (when (funcall predicate symbol)
        (push (string-downcase (symbol-name symbol)) symbols)))
    (sort symbols #'string<)))
    
(standard-symbol-names #'fboundp)
(standard-symbol-names #'boundp)

The STANDARD-SYMBOL-NAMES build a list of symbols exported from the :COMMON-LISP package which satisfy a predicate. The first call gives us the name of all symbols bound to a function, and the second all which are bound to a value.

The astute reader will immediately wonder about symbols which are bound both a function and a value. They are easy to find by calling INTERSECTION on both sets of names: +, /, *, -. It is not really a problem: we can highlight function calls by matching function names preceded by (, making sure that these symbols will be correctly identified as either function symbols or value symbols depending on the context.

We store these lists of strings in the g-cl-function-names and g-cl-value-names (the associated code is not reproduced here: these lists are quite long; but I posted them as a Gist).

With this lists, we can use the regexp-opt Emacs Lisp function to build optimized regular expressions matching them:

(defvar g-cl-font-lock-keywords
  (let* ((character-re (concat "#\\\\" lisp-mode-symbol-regexp "\\_>"))
         (function-re (concat "(" (regexp-opt g-cl-function-names t) "\\_>"))
         (value-re (regexp-opt g-cl-value-names 'symbols)))
    `((,character-re . 'g-cl-character-face)
      (,function-re
       (1 'g-cl-standard-function-face))
      (,value-re . 'g-cl-standard-value-face))))

Characters literals are reasonably easy to match.

Functions are a bit more complicated since we want to match the function name when it is preceded by an opening parenthesis. We use a capture capture (see the last argument of regexp-opt) for the function name and highlight it separately.

Values are always matched as full symbols: we do not want to highlight parts of a symbol, for example MAP in a symbol named MAPPING.

Final configuration

Finally we can define the variable which will be used for font-lock-defaults in the initialization hook; we copy the original value from lisp-mode, and change the keyword list for what is going to be our own configuration:

(defvar g-cl-font-lock-defaults
  '((g-cl-font-lock-keywords)
    nil                                 ; enable syntaxic highlighting
    t                                   ; case insensitive highlighting
    nil                                 ; use the lisp-mode syntax table
    (font-lock-mark-block-function . mark-defun)
    (font-lock-extra-managed-props help-echo)
	(font-lock-syntactic-face-function
	 . lisp-font-lock-syntactic-face-function)))

To configure font-lock-defaults, we simply set it in the initialization hook of lisp-mode:

(defun g-init-lisp-font-lock ()
  (setq font-lock-defaults g-cl-font-lock-defaults))
  
(add-hook 'lisp-mode-hook 'g-init-lisp-font-lock)

Comparison

Let us compare highlighting for a fragment of code before and after our changes:

The differences are subtle but important:

All standard functions are highlighted, helping to distinguish them from user-defined functions.
Standard values such as *ERROR-OUTPUT* are highlighted.
Character literals are highlighted the same way as character strings.
Keywords are not highlighted anymore, avoiding the confusion with function names.

Conclusion

That was not easy; but as always, the effort of going through the documentation and experimenting with different Emacs components was very rewarding. Font Lock does not feel like a black box anymore, opening the road for the customization of other major modes.

In the future, I will work on a custom color scheme to use more subtle colors, with the hope of reducing the rainbow effect of so many major modes, including lisp-mode.

Common Lisp implementations in 2023

Wed, 22 Feb 2023 18:00:00 +0000

Much has been written on Common Lisp; there is rarely one year without someone proclaming the death of the language and how nobody uses it anymore. And yet it is still here, so something must have been done right.

Common Lisp is not a software, it is a language described by the ANSI INCITS 226-1994 standard; there are multiple implementations available, something often used as argument for how alive and thriving the language is.

Let us see what the 2023 situation is.

General information

Implementation	License	Target	Last release
SBCL	Public domain	Native	2023/01 (2.3.1)
CCL	Apache 2.0	Native	2021/05 (1.12.1)
ECL	LGPL 2.1	Native (C translation)	2021/02 (21.2.1)
ABCL	GPL2	Java bytecode	2023/02 (1.9.1)
CLASP	LGPL 2.1	Native (LLVM)	2023/01 (2.1.0)
CMUCL	Public domain	Native	2017/10 (21c)
GCL	LGPL2	Native (C translation)	2023/01 (2.6.14)
CLISP	GPL	Bytecode	2010/07 (2.49)
Lispworks	Proprietary	Native	2022/06 (8.0.1)
Allegro	Proprietary	Native	2017/04 (10.1)

Note that all projects may have small parts with different licenses. This is particularily important for CLASP which contains multiple components imported from other projects.

I was quite surprised to see so many projects with recent releases. Clearly a good sign. Let us look at each implementation.

Implementations

SBCL

Steel Bank Common Lisp was forked from CMUCL in December 1999 and has since massively grown in popularity; it is currently the most used implementation by far. Unsurprisingly given its popularity, SBCL is supported by pretty much all Common Lisp libraries and tools out there. It is well known for generating fast native code compared to other implementations.

The most important aspect of SBCL is that it is actively maintained: its developers release new versions on a monthly basis, bringing each time a small list of improvements and bug fixes. Activity has actually increased these last years, something uncommon in the Common Lisp world.

CCL

Clozure Common Lisp has a long and complex history and has been around for decades. It is a mature implementation; it has two interesting aspects compared to SBCL:

The compiler is much faster.
Error messages tend to be clearer.

This is why I currently use it to test my code along SBCL. And according to what I have heard, this is a common choice among developers.

The main issue with CCL is that the project is almost completely abandonned. Git activity has slowed down to a crawl in the last two years, and none of the original maintainers from Clozure seem to be actively working on it. It remains nonetheless a major implementation.

ECL

Embeddable Common Lisp is a small implementation which can be used both as a library or as a standalone program. It contains a bytecode interpreter, but can also translate Lisp code to C to be compiled to native code.

While development is slow, improvements and bug fixes are still added on a regular basis. Clearly an interesting project: I could see myself using ECL to write plugins into an application able to call a C library.

ABCL

Armed Bear Common Lisp is quite different from other implementations: it produces Java bytecode and targets the Java Virtual Machine, making it a useful tool in Java ecosystems.

While it has not found the same success as Clojure, ABCL is still a fully featured Common Lisp implementation which passes almost the entire ANSI Common Lisp test suite.

Developement is slow nowadays but there are still new releases with lots of bug fixes. Also note that two of the developers are able to provide paid support.

CLASP

CLASP is a newcomer in the Common Lisp world (new meaning it is less than a decade old). Developed by Christian Schafmeister for his research work, this implementation has been used as an exemple of how alive and kicking Common Lisp, mainly due to two excellent presentations.

While very promising, CLASP suffers from its young age: trying to run the last release on my code resulted in a brutal error with no details and no backtrace. However I have no doubt that CLASP will get a lot better: it is actively maintained and used in production, two of the necessary ingredients for a software to stay relevant.

GCL

GNU Common Lisp is described as the official Common Lisp implementation for the GNU project. While it clearly does not have the popularity of other implementations, it is still a maintained project.

Trying to use it, I quickly realized it is not fully compliant with the standard. For example it will fail when evaluating a call to COMPILE-FILE with the :VERBOSE key argument.

Hopefully development will continue.

CLISP

CLISP is almost as old as I am; it was the first implementation I used a long time ago, and it still works. While it has all the usual features (multithreading, FFI, MOP, etc.), there is no real reason to use it compared to other implementations.

Even if it was to have any specific feature, CLISP is almost completely abandonned. While there are has been a semblant of activity a few years ago, active development pretty much stopped around 2012; the last release was more than 12 years ago.

Lispworks

Moving to proprietary implementations; Lispworks has been around for more than 30 years and the company producing it still release new versions on a regular basis.

While Lispworks supports most features you would expect from a commercial product (native compiler, multithreading, FFI, GUI library, various graphical tools, a Prolog implementation…), it is hampered by its licensing system.

The free “Personal Edition” limits the program size and the amount of time it can run, making it pretty much useless for anything but evaluation. The professional and enterprise licenses do not really make sense for anyone: you will have to buy separate licenses for every single platform at more than a thousand euros per license (with the enterprise version being 2-3 times more expensive). Of course you will have to buy a maintenance contract on a yearly basis… but it does not include technical support. It will have to be bought with “incident packs” costing thousands of euros; because yes, paying for a product and a maintenance contract does not mean they will fix bugs, and you will have to pay for each of them.

I do not have anything personal against commercial software, and I strongly support developers being paid for their work. But this kind of licensing makes Lispworks irrelevant to everyone but those already using their proprietary libraries.

Allegro

Allegro Common Lisp is the other well known proprietary implementation. Developped by Franz Inc., it is apparently used by multiple organizations including the U.S. Department of Defense.

Releases are uncommon, the last one being almost 6 years ago. But Allegro is a mature implementation packed with features not easily replicated such as AllegroCache, AllegroServe, libraries for multiple protocols and data formats, analysis tools, a concurrent garbage collector and even an OpenGL interface.

Allegro suffers the same issue as Lispworks: the enterprise-style pricing system is incredibly frustrating. The website advertises a hefty $599 starting price (which at least includes technical support), but there is no mention of what it contains. Interested developpers will have to contact Franz Inc. to get other prices. A quick Google search will reveal rumours of enterprise versions priced above 8000 dollars. No comment.

Conclusion

Researching Common Lisp implementations has been interesting. While it is clear that the language is far from dead, its situation is very fragile. Proprietary implementations are completely out of touch with the needs of most developers, leaving us with a single open source, actively maintained, high performance implementation: SBCL. Unless of course they are willing to deal with the JVM to use ABCL.

It might me interesting to investigate a possible solution to keep CCL somehow alive, with patches being merged and releases being produced. I sent a patch very recently, let us see what can be done!

Reading files faster in Common Lisp

Wed, 15 Feb 2023 18:00:00 +0000

While Common Lisp has functions to open, read and write files, none of them takes care of reading and returning the entire content. This is something that I do very regularly, so it made sense to add such a function to Tungsten. It turned out to be a bit more complicated than expected.

A simple but incorrect implementation

The simplest implementation relies on the FILE-LENGTH function which returns the length of a stream (which of course only makes sense for a file stream). The Hyperspec clearly states that “for a binary file, the length is measured in units of the element type of the stream”. Since we are only reading binary data, everything is fine.

Let us write the function:

(defun read-file (path)
  (declare (type (or pathname string) path))
  (with-open-file (file path :element-type 'core:octet)
    (let ((data (make-array (file-length file) :element-type 'core:octet)))
      (read-sequence data file)
      data)))

Note that CORE:OCTET is a Tungsten type for (UNSIGNED-BYTE 8).

The function works as expected, returning the content of the file as an octet vector. But it is not entirely correct.

This implementation only works for regular files. Various files on UNIX will report a length of zero but can still be read. Now you might protest that it would not make sense to call READ-FILE on a device such as /dev/urandom, and you would be right. But a valid example would be pseudo files such as those part of procfs. If you want to obtain memory stats about your process on Linux, you can simply read /proc/self/statm. But this is not a regular file and READ-FILE will return an empty octet vector.

Doing it right and slow

The right way to read a file is to read its content block by block until the read operation fails because it reached the end of the file.

Let us re-write READ-FILE:

(defun read-file (path)
  (declare (type (or pathname string) path))
  (let ((data (make-array 0 :element-type 'core:octet :adjustable t))
        (block-size 4096)
        (offset 0))
    (with-open-file (file path :element-type 'core:octet)
      (loop
        (let* ((capacity (array-total-size data))
               (nb-left (- capacity offset)))
          (when (< nb-left block-size)
            (let ((new-length (+ capacity (- block-size nb-left))))
              (setf data (adjust-array data new-length)))))
        (let ((end (read-sequence data file :start offset)))
          (when (= end offset)
            (return-from read-file (adjust-array data end)))
          (setf offset end))))))

This time we rely on an adjustable array; we iterate, making sure we have enough space in the array to read an entire block each time. When the array is too short, we use ADJUST-ARRAY to extend it, relying on its ability to reuse the underlying storage instead of systematically copying its content.

Finally, once READ-SEQUENCE stops returning data, we truncate the array to the right size and return it.

This function worked correctly and I started using it regularly. Recently I started working with a file larger than usual and realized that READ-FILE was way too slow. With a NVMe drive, I would expect to be able to read a 10+MB file almost instantaneously, but it took several seconds.

After inspecting the code to find what could be so slow, I started to wonder about ADJUST-ARRAY; while I thought SBCL would internally extend the underlying memory in large blocks to minimize allocations, behaving similarly to realloc() in C, it turned out not to be the case. While reading the code behind ADJUST-ARRAY, I learned that it precisely allocates the required size. As a result, this implementation of READ-FILE performs one memory allocation for each 4kB block. Not a problem for small files, slow for larger ones.

A final version, correct and fast

Since I understood what the problem was, fixing it was trivial. When there is not enough space to read a block, we extend the array by at least 50% of its current size. Of course this is a balancing act: for example doubling the size at each allocation would reduce even more the number of allocations, but would increase the total amount of memory allocated. The choice is up to you.

(defun read-file (path)
  (declare (type (or pathname string) path))
  (let ((data (make-array 0 :element-type 'core:octet :adjustable t))
        (block-size 4096)
        (offset 0))
    (with-open-file (file path :element-type 'core:octet)
      (loop
        (let* ((capacity (array-total-size data))
               (nb-left (- capacity offset)))
          (when (< nb-left block-size)
            (let ((new-length (max (+ capacity (- block-size nb-left))
                                   (floor (* capacity 3) 2))))
              (setf data (adjust-array data new-length)))))
        (let ((end (read-sequence data file :start offset)))
          (when (= end offset)
            (return-from read-file (adjust-array data end)))
          (setf offset end))))))

This last version reads a 250MB file in a quarter of a second, while the original version took almost two minutes. Much better!

Custom Common Lisp indentation in Emacs

Mon, 23 Jan 2023 18:00:00 +0000

While SLIME is most of the time able to indent Common Lisp correctly, it will sometimes trip on custom forms. Let us see how we can customize indentation.

In the process of writing my PostgreSQL client in Common Lisp, I wrote a READ-MESSAGE-CASE macro which reads a message from a stream and execute code depending on the type of the message:

(defmacro read-message-case ((message stream) &rest forms)
  `(let ((,message (read-message ,stream)))
     (case (car ,message)
       (:error-response
        (backend-error (cdr ,message)))
       (:notice-response
        nil)
       ,@forms
       (t
        (error 'unexpected-message :message ,message)))))

This macro is quite useful: all message loops can use it to automatically handle error responses, notices, and signal unexpected messages.

But SLIME does not know how to indent READ-MESSAGE-CASE, so by default it will align all message forms on the first argument:

(read-message-case (message stream)
                   (:authentication-ok
                     (return))
                   (:authentication-cleartext-password
                     (unless password
                       (error 'missing-password))
                     (write-password-message password stream)))

While we want it aligned the same way as HANDLER-CASE:

(read-message-case (message stream)
  (:authentication-ok
    (return))
  (:authentication-cleartext-password
    (unless password
      (error 'missing-password))
    (write-password-message password stream)))

Good news, SLIME indentation is defined as a list of rules. Each rule associates an indentation specification (a S-expression describing how to indent the form) to a symbol and store it as the common-lisp-indent-function property of the symbol.

You can obtain the indentation rule of a Common Lisp symbol easily. For example, executing (get 'defun 'common-lisp-indent-function) (e.g. in IELM or with eval-expression) yields (4 &lambda &body). This indicates that DEFUN forms are to be indented as follows:

The first argument of DEFUN (the function name) is indented by four spaces.
The second argument (the list of function arguments) is indented as a lambda list.
The rest of the arguments are indented based on the lisp-body-indent custom variable, which controls the indentation of the body of a lambda form (two spaces by default).

You can refer to the documentation of the common-lisp-indent-function Emacs function (defined in SLIME of course) for a complete description of the format.

We want READ-MESSAGE-CASE to be indented the same way as HANDLER-CASE, whose indentation specification is (4 &rest (&whole 2 &lambda &body)) (in short, an argument and a list of lambda lists). Fortunately there is a way to specify that a form must be indented the same way as another form, using (as <symbol>).

Let us first define a function to set the indentation specification of a symbol:

(defun g-common-lisp-indent (symbol indent)
  "Set the indentation of SYMBOL to INDENT."
  (put symbol 'common-lisp-indent-function indent))

Then use it for READ-MESSAGE-CASE:

(g-common-lisp-indent 'read-message-case '(as handler-case))

While it is in general best to avoid custom indentation, exceptions are sometimes necessary for readability. And SLIME makes it easy.

ANSI color rendering in SLIME

Mon, 16 Jan 2023 18:00:00 +0000

I was working on the terminal output for a Common Lisp logger, and I realized that SLIME does not interpret ANSI escape sequences.

This is not the end of the world, but having at least colors would be nice. Fortunately there is a library to do just that.

First let us install the package, here using use-package and straight.el.

(use-package slime-repl-ansi-color
  :straight t)

While in theory we are supposed to just add slime-repl-ansi-color to slime-contribs, it did not work for me, and I add to enable the minor mode manually.

If you already have a SLIME REPL hook, simply add (slime-repl-ansi-color-mode 1). If not, write an initialization function, and add it to the SLIME REPL initialization hook:

(defun g-init-slime-repl-mode ()
  (slime-repl-ansi-color-mode 1))
  
(add-hook 'slime-repl-mode-hook 'g-init-slime-repl-mode)

To test that it works as intended, fire up SLIME and print a simple message using ANSI escape sequences:

(let ((escape (code-char 27)))
  (format t "~C[1;33mHello world!~C[0m~%" escape escape))

While it is tempting to use the #\Esc character, it is part of the Common Lisp standard; therefore we use CODE-CHAR to obtain it from its ASCII numeric value. We use two escape sequences, the first one to set the bold flag and foreground color, and the second one to reset display status.

If everything works well, should you see a nice bold yellow message:

Switching between implementations with SLIME

Thu, 12 Jan 2023 18:00:00 +0000

While I mostly use SBCL for Common Lisp development, I regularly switch to CCL or even ECL to run tests.

This is how I do it with SLIME.

Starting implementations

SLIME lets you configure multiple implementations using the slime-lisp-implementations setting. In my case:

(setq slime-lisp-implementations
   '((sbcl ("/usr/bin/sbcl" "--dynamic-space-size" "2048"))
     (ccl ("/usr/bin/ccl"))
     (ecl ("/usr/bin/ecl"))))

Doing so means that running M-x slime will execute the first implementation, i.e. SBCL. There are two ways to run other implementations.

First you can run C-u M-x slime which lets you type the path and arguments of the implementation to execute. This is a bit annoying because the prompt starts with the content of the inferior-lisp-program variable, i.e. "lisp" by default, meaning it has to be deleted manually each time. Therefore I set inferior-lisp-program to the empty string:

(setq inferior-lisp-program "")

Then you can run C-- M-x slime (or M-- M-x slime which is easier to type) to instruct SLIME to use interactive completion (via completing-read) to let you select the implementations among those configured in slime-lisp-implementations.

To make my life easier, I bind C-c C-s s to a function which always prompt for the implementation to start:

(defun g-slime-start ()
  (interactive)
  (let ((current-prefix-arg '-))
    (call-interactively 'slime)))

Using C-c C-s as prefix for all my global SLIME key bindings helps me remember them.

Switching between multiple implementations

Running the slime function several times will create multiple connections as expected. Commands executed in Common Lisp buffers are applied to the current connection, which is by default the most recent one.

There are two ways to change the current implementation:

Run M-x slime-next-connection.
Run M-x slime-list-connections, which opens a buffer listing connections, and lets you choose the current one with the d key.

I find both impractical: the first one does not let me choose the implementation, forcing me to run potentially several times before getting the one I want. The second one opens a buffer but does not switch to it.

All I want is a prompt with completion. So I wrote one.

First we define a function to select a connection among existing one:

(defun g-slime-select-connection (prompt)
  (interactive)
  (let* ((connections-data
          (mapcar (lambda (process)
                    (cons (slime-connection-name process) process))
                  slime-net-processes))
         (completion-extra-properties
          '(:annotation-function
            (lambda (string)
              (let* ((process (alist-get string minibuffer-completion-table
                                         nil nil #'string=))
                     (contact (process-contact process)))
                (if (consp contact)
                    (format "  %s:%s" (car contact) (cadr contact))
                  (format "  %S" contact))))))
         (connection-name (completing-read prompt connections-data)))
    (let ((connection (cl-find connection-name slime-net-processes
                               :key #'slime-connection-name
                               :test #'string=)))
      (or connection
          (error "Unknown SLIME connection %S" connection-name)))))

Then use it to select a connection as the current one:

(defun g-slime-switch-connection ()
  (interactive)
  (let ((connection (g-slime-select-connection "Switch to connection: ")))
    (slime-select-connection connection)
    (message "Using connection %s" (slime-connection-name connection))))

I bind this function to C-c C-s c.

In a perfect world, we could format nice columns in the prompt and highlight the current connection, but the completing-read interface is really limited, and I did not want to use an external package such as Helm.

Stopping implementations

Sometimes it is necessary to stop an implementations and kill all associated buffers. It is not something I use a lot; but when I need it, it is frustrating to have to switch to the REPL buffer, run slime-quit-lisp, then kill the REPL buffer manually.

Adding this feature is trivial with the g-slime-select-connection defined earlier:

(defun g-slime-kill-connection ()
  (interactive)
  (let* ((connection (g-slime-select-connection "Kill connection: "))
         (repl-buffer (slime-repl-buffer nil connection)))
    (when repl-buffer
      (kill-buffer repl-buffer))
    (slime-quit-lisp-internal connection 'slime-quit-sentinel t)))

Finally I bind this function to C-c C-s k.

It is now much more comfortable to manage multiple implementations.

Improving Git diffs for Lisp

Sun, 08 Jan 2023 18:00:00 +0000

All my code is stored in various Git repositories. When Git formats a diff between two objects, it generates a list of hunks, or groups of changes.

Each hunk can be displayed with a title which is automatically extracted. Git ships with support for multiple languages, but Lisp dialects are not part of it. Fortunately Git lets users configure their own extraction.

The first step is to identify the language using a pattern applied to the filename. Edit your Git attribute file at $HOME/.gitattributes and add entries for both Emacs Lisp and Common Lisp:

*.lisp diff=common-lisp
*.el diff=elisp

Then edit your Git configuration file at $HOME/.gitconfig and configure the path of the Git attribute file:

[core]
    attributesfile = ~/.gitattributes

Finally, set the regular expression used to match a top-level function name:

[diff "common-lisp"]
    xfuncname="^\\((def\\S+\\s+\\S+)"
    
[diff "elisp"]
    xfuncname="^\\((((def\\S+)|use-package)\\s+\\S+)"

For Lisp dialects, we do not just identify function names: it is convenient to identify hunks for all sorts of top-level definitions. We use a regular expression which captures the first symbol of the form and the name that follows.

Of course you can modifiy these expressions to identify more complex top-level forms. For example, for Emacs Lisp, I also want to identify use-package expressions.

You can see the result in all tools displaying Git diffs, for example in Magit with Common Lisp code:

Or for my Emacs configuration file:

Hunk titles, highlighted in blue, now contain the type and name of the top-level construction the changes are associated with.

A simple change, but one which really helps reading diffs.

Configuring SLIME cross-referencing

Wed, 28 Dec 2022 18:00:00 +0000

The SLIME Emacs package for Common Lisp supports cross-referencing: one can list all references pointing to a symbol, move through this list and jump to the source code of each reference.

Removing automatic reference jumps

While cross-referencing is very useful, the default configuration is frustrating: moving through the list in the Emacs buffer triggers the jump to the reference under the cursor. If you are interested in a reference in the middle of the list, you will have to move to it, opening multiple buffers you do not care about as a side effect. I finally took the time to fix it.

Key bindings for slime-ref-mode mode are stored in the slime-xref-mode-map keymap. After a quick look in slime.el, it is easy to remove bindings for slime-xref-prev-line and slime-xref-next-line:

(define-key slime-xref-mode-map (kbd "n") nil)
(define-key slime-xref-mode-map [remap next-line] nil)
(define-key slime-xref-mode-map (kbd "p") nil)
(define-key slime-xref-mode-map [remap previous-line] nil)

If you are using use-package, it is even simpler:

(use-package slime
  (:map slime-xref-mode-map
      (("n")
       ([remap next-line])
       ("p")
       ([remap previous-line]))))

Changing the way references are used

SLIME supports two ways to jump to a reference:

With return or space, it spawns a buffer containing the source file and close the cross-referencing buffer.
With v, it spawns the source file buffer but keeps the cross-referencing buffer open and keeps it current.

This is not practical to me, so I made a change. The default action, triggered by return, now keeps the cross-referencing buffer open and switches to the source file in the same window. This way, I can switch back to the cross-referencing buffer with C-x b to select another reference without spawning buffers in other windows (I do not like having my windows hijacked by commands).

To do that, I need a new function:

(defun g-slime-show-xref ()
  "Display the source file of the cross-reference under the point
in the same window."
  (interactive)
  (let ((location (slime-xref-location-at-point)))
    (slime-goto-source-location location)
    (with-selected-window (display-buffer-same-window (current-buffer) nil)
      (goto-char (point))
      (g-recenter-window))))

Note the use of g-recenter-window, a custom function to move the current point at eye level. Feel free to use the builtin recenter function instead.

I then bind the function to return and remove other bindings:

(define-key slime-xref-mode-map (kbd "RET") 'g-slime-show-xref)
(define-key slime-xref-mode-map (kbd "SPC") nil)
(define-key slime-xref-mode-map (kbd "v") nil)

Much better now!

Fixing unquote-splicing behaviour with Paredit

Tue, 13 Dec 2022 18:00:00 +0000

Paredit is an Emacs package for structural editing. It is particularly useful in Lisp languages to manipulate expressions instead of just characters.

One of the numerous little features of Paredit is the automatic insertion of a space character before a delimiting pair. For example, if you are typing (length, typing ( will have Paredit automatically insert a space character before the opening parenthesis, to produce the expected (length ( content.

Paredit is smart enough to avoid doing so after quote, backquote or comma characters, but not after an unquote-splicing sequence (,@) which is quite annoying in languages such as Scheme or Common Lisp. As almost always in Emacs, this behaviour can be customized.

Paredit decides whether to add a space or not using the paredit-space-for-delimiter-p function, ending up with applying a list of predicates from paredit-space-for-delimiter-predicates.

Let us add our own. For more flexibility, we will start by defining a list of prefixes which are not to be followed by a space:

(defvar g-paredit-no-space-prefixes (list ",@"))

We then write our predicate which simply checks if we are right after one of these prefixes:

(defun g-paredit-space-for-delimiter (endp delimiter)
  (let ((point (point)))
    (or endp
        (seq-every-p
         (lambda (prefix)
           (and (> point (length prefix))
                (let ((start (- point (length prefix)))
                      (end point))
                  (not (string= (buffer-substring start end) prefix)))))
         g-paredit-no-space-prefixes))))

Finally we add a Paredit hook to append our predicate to the list:

(defun g-init-paredit-space-for-delimiter ()
  (add-to-list 'paredit-space-for-delimiter-predicates
               'g-paredit-space-for-delimiter))

(add-hook 'paredit-mode-hook 'g-init-paredit-space-for-delimiter)

Not only does it fix the problem for unquote-slicing, but it makes it easy to add new prefixes. For example I immediately added #p (used for pathnames in Common Lisp, e.g. #p"/usr/bin/") to the list.

SLIME compilation tips

Mon, 12 Dec 2022 18:00:00 +0000

I recently went back to Common Lisp to solve the daily problems of the Advent of Code. Of course it started with installing and configuring SLIME, the main major mode used for Common Lisp development in Emacs.

The most useful feature of SLIME is the ability to load sections of code into the Common Lisp implementation currently running. One can use C-c C-c to evaluate the current top-level form, and C-c C-k to reload the entire file, making incremental development incredibly convenient.

However I found the default configuration frustrating. Here are a few tips which made my life easier.

Removing the compilation error prompt

If the Common Lisp implementation fails to compile the file, SLIME will ask the user if they want to load the fasl file (i.e. the compiled form of the file) anyway.

I cannot find a reason why one would want to load the ouput of a file that failed to compile, and having to decline every time is quite annoying.

Disable the prompt by setting slime-load-failed-fasl to 'never:

(setq slime-load-failed-fasl 'never)

Removing the SLIME compilation buffer on success

When compilation fails, SLIME creates a new window containing the diagnostic reported by the Common Lisp implementation. I use display-buffer-alist to make sure the window is displayed on the right side of my three-column split, and fix my code in the middle column.

However if the next compilation succeeds, SLIME updates the buffer to indicate the absence of error, but keeps the window open even though it is not useful anymore, meaning that I have to switch to it and close it with q.

One can look at the slime-compilation-finished function to see that SLIME calls the function referenced by the slime-compilation-finished-hook variable right after the creation or update of the compilation buffer. The default value is slime-maybe-show-compilation-log which does not open a new window if there is no error, but does not close an existing one.

Let us write our own function and use it:

(defun g-slime-maybe-show-compilation-log (notes)
  (with-struct (slime-compilation-result. notes successp)
      slime-last-compilation-result
    (when successp
      (let ((name (slime-buffer-name :compilation)))
        (when (get-buffer name)
          (kill-buffer name))))
    (slime-maybe-show-compilation-log notes)))
    
(setq slime-compilation-finished-hook 'g-slime-maybe-show-compilation-log)`

Nothing crazy here, we obtain the compilation status (in a very SLIME-specific way, with-struct is not a standard Emacs Lisp macro) and kill the compilation buffer if there is one while compilation succeeded.

Making compilation less verbose

Common Lisp specifies two variables, *compile-verbose* and *load-verbose*, which control how much information is displayed during compilation and loading respectively.

My implementation of choice, SBCL, is quite chatty by default. So I always set both variables to nil in my $HOME/.sbclrc file.

However SLIME forces *compile-verbose*; this is done in SWANK, the Common Lisp part of SLIME. When compiling a file, SLIME instructs the running Common Lisp implementation to execute swank:compile-file-for-emacs which forces *compile-verbose* to t around the call of a list of functions susceptible to handle the file. The one we are interested about is swank::swank-compile-file*.

First, let us write some Common Lisp code to replace the function with a wrapper which sets *compile-verbose* to nil.

(let ((old-function #'swank::swank-compile-file*))
  (setf (fdefinition 'swank::swank-compile-file*)
        (lambda (pathname load-p &rest options &key policy &allow-other-keys)
          (declare (ignore policy))
          (let ((*compile-verbose* nil))
            (apply old-function pathname load-p options)))))

We save it to a file in the Emacs directory.

In Emacs, we use the slime-connected-hook hook to load the code into the Common Lisp implementation as soon as Slime is connected to it:

(defun g-slime-patch-swank-compilation-function ()
  (let* ((path (expand-file-name "swank-patch-compilation-function.lisp"
                                 user-emacs-directory))
         (lisp-path (slime-to-lisp-filename path)))
    (slime-eval-async `(swank:load-file ,lisp-path))))
    
(add-hook 'slime-connected-hook 'g-slime-patch-swank-compilation-function)

Quite a hack, but it works.

Local CLHS access in Emacs

Wed, 01 Jan 2020 18:00:00 +0000

The CLHS, or Common Lisp HyperSpec, is one of the most important resource for any Common Lisp developer. It is derived from the official Common Lisp standard, and contains documentation for every aspect of the language.

While it is currently made available online by LispWorks, it can be useful to be able to access it locally, for example when you do not have any internet connection available.

For this purpose, LispWorks provides an archive which can be downloaded and browsed offline.

While the HyperSpec is valuable on its own, the SLIME Emacs mode provides various functions to make it even more useful.

I have found the following functions particularily useful:

slime-documentation-lookup, or C-c C-d h to browse the documentation associated with a symbol.
common-lisp-hyperspec-format, or C-c C-d ~, to lookup format control characters.
common-lisp-hyperspec-glossary-term, or C-c C-d g, to access terms in the glossary.

With the default configuration, Emacs will use the online HyperSpec. You can have it use a local copy by setting common-lisp-hyperspec-root to a file URI. For example, if you downloaded the content of the CLHS archive to ~/common-lisp/:

(setq common-lisp-hyperspec-root
  (concat "file://" (expand-file-name "~/common-lisp/HyperSpec/")))

And if you configure Emacs to use the EWW web browser, you can work with the CLHS without leaving your editor.

ASDF in Common Lisp scripts

Mon, 30 Dec 2019 18:00:00 +0000

The usual way to develop programs in Common Lisp is to use SLIME in Emacs, which starts an implementation and provides a REPL. When a program needs to be running in production, one can either execute it from source or compile it to an executable core, for example with sb-ext:save-lisp-and-die in SBCL.

While executable cores works well for conventional applications, they are less suitable for small scripts which should be easy to run without having to build anything.

Writing a basic script with SBCL is easy:

#!/bin/sh
#|
exec sbcl --script "$0" "$@"
|#

(format t "Hello world!~%")

Since UNIX shebangs cannot be used to run commands with more than one argument, it is impossible to call SBCL directly (it requires the --script argument, and #!/usr/bin/env sbcl --script contains two arguments). However it is possible to start as a simple shell script and just execute SBCL with the right arguments. And since we can include any shell commands, it is possible to support multiple Common Lisp implementations depending on the environment.

This method works. But if your script has any dependency, configuring ASDF can be tricky. ASDF can pick up system directory paths from multiple places, and you do not want your program to depend on your development environment. If you run your script in a CI environment or a production system, you will not have access to your ASDF configuration and your systems.

Fortunately, ASDF makes it possible to manually configure the source registry at runtime using asdf:initialize-source-registry, giving you total control on the places which will be used to find systems.

For example, if your Common Lisp systems happen to be stored in a systems directory at the same level as your script, you can use the :here directive:

#!/bin/sh
#|
exec sbcl --script "$0" "$@"
|#

(require 'asdf)

(asdf:initialize-source-registry
 `(:source-registry :ignore-inherited-configuration
                    (:tree (:here "systems"))))

And if you store all your systems in a Git repository, you can use submodules to include a systems directory in every project, making it simple to manage the systems you need and their version. Additionally, anyone with an implementation installed, SBCL in this example, can now execute these scripts without having to install or configure anything. This is quite useful when you work with people who do not know Common Lisp.

Of course, you can use the same method when building executables: just create a script whose only job is to setup ASDF, load your program, and dump an executable core. This way, you can make sure you control exactly which set of systems is used. And it can easily be run in a CI environment.

Nicolas Martyanoff – Brain dump

Interactive Common Lisp development

Variables

Functions

Symbols and packages

Classes

Generics and methods

Meta Object Protocol

Structures

Conditions

Types

Going further

Reduce vs fold in Common Lisp

Introduction

Understanding REDUCE

Differences with fold

Implementing FOLDL

Counting lines with Common Lisp

The program

Finding files

Counting lines

Identifying the file type

Collecting file information

Presenting results

Finalizing the program

Custom Font Lock configuration in Emacs

Common Lisp highlighting done wrong

Font Lock configuration

Simplified Common Lisp highlighting

Faces

Keywords

Final configuration

Comparison

Conclusion

Common Lisp implementations in 2023

General information

Implementations

SBCL

CCL

ECL

ABCL

CLASP

GCL

CLISP

Lispworks

Allegro

Conclusion

Reading files faster in Common Lisp

A simple but incorrect implementation

Doing it right and slow

A final version, correct and fast

Custom Common Lisp indentation in Emacs

ANSI color rendering in SLIME

Switching between implementations with SLIME

Starting implementations

Switching between multiple implementations

Stopping implementations

Improving Git diffs for Lisp

Configuring SLIME cross-referencing

Removing automatic reference jumps

Changing the way references are used

Fixing unquote-splicing behaviour with Paredit

SLIME compilation tips

Removing the compilation error prompt

Removing the SLIME compilation buffer on success

Making compilation less verbose

Local CLHS access in Emacs

ASDF in Common Lisp scripts

Understanding `REDUCE`

Implementing `FOLDL`