OCR from Common Lisp

Neat use of CL for OCR by Nick Faro. It leverages FFI and run-program nicely to get the job done.

I think run-program or equivalent is amazingly handy at getting quick CL access to outside functionality.

I ran a busy website that used imagemagick a lot, but I never bothered to use FFI. I called “convert” and friends via run-program, and it had the advantage that incorrect use of the C API never crashed my CL session.

run-program is not particularly fast, but for my application (and many others) it can be fast enough.

Non-symbols as keyword arguments

Sometimes I like to have a short function call to make a lookup table with arbitrary keys and values. Something like this:

(table "name" "Alice" "age" 42) => #<TABLE ...>

This is easy enough to define like so:

(defun table (&rest keys-and-values)
   ;;; loop over keys and values to construct a table
   ...)

But there’s another option that works:

(defun table (&rest keys-and-values &key &allow-other-keys)
  ...)

This version has the advantage that missing arguments (e.g. (table "name")) are caught at compile-time!

I’ve been using this construct for a few years, but recently found out about 3.5.1.5, which says:

It is not permitted to supply a keyword argument to a function using a name that is not a symbol.

Yikes! That seems simple and straightforward: what I’m doing is not permitted. However! Every implementation I’ve tried (only three, admittedly) actually allows my not-symbol keywords without complaint!

I’m too old to think that “it works everywhere today” means “it will continue to work in SBCL tomorrow”, so I’m trying to figure out where I stand. 3.5.1.5 also says this:

If this situation occurs in a safe call, an error of type program-error must be signaled unless keyword argument checking is suppressed as described in Section 3.4.1.4.1 (Suppressing Keyword Argument Checking); and in an unsafe call the situation has undefined consequences.

So does that mean my code is:

  • a safe call with suppressed keyword argument checking?
  • …and is that good to use now and forever?
  • an unsafe call with undefined consequences?
  • something else?

I know the same effect could be achieved with a compiler macro, but I’d like to know if I can use this simpler option safely.

New SBCL 2.0.9 behavior breaks some stuff

The latest SBCL handles slot :initform and :type options in a new way. It’s mentioned in the release notes.

minor incompatible change: the compiler signals a warning at compile-time when an initform of T, NIL or 0 does not match a STANDARD-CLASS slot’s declared type._

Sounds pretty benign, but it breaks dozens of projects in Quicklisp. (To be fair, most of the failures are caused by a small number of core systems on which many other systems depend.)

Here’s an example of the new behavior:

(defclass foo ()
  ((name :type string :initform nil)))

With the above defclass form, SBCL 2.0.9 will signal a warning at compile time:

; processing (DEFCLASS FOO ...)

; file: foo.lisp
; in: DEFCLASS FOO
;     (NAME :TYPE STRING :INITFORM NIL)
; ==>
;   (SB-KERNEL:THE* (STRING :SOURCE-FORM NIL :USE-ANNOTATIONS T) NIL)
; 
; caught WARNING:
;   Constant NIL conflicts with its asserted type STRING.
;   See also:
;     The SBCL Manual, Node "Handling of Types"
; 
; compilation unit finished
;   caught 1 WARNING condition

This compile-time warning means “failure” as far as loading with ASDF is concerned.

If you have both :type and :initform in your slot definitions, and you want to be compatible with the latest SBCL, make sure the initform type matches the slot type. If you want to use NIL as the initform, one easy option is to set the type to (or null <actual type>).

SBCL20 in Vienna

Last week I attended the SBCL 20th anniversary workshop, held in Vienna, Austria. Here are some rough impressions - I wish I had detailed enough notes to recreate the experience for those who did not attend, but alas, this is what you get! It’s incomplete and I’m sorry if I’ve left out someone unfairly - you have to go yourself next time so you don’t miss a thing.

Structure of the gathering

I didn’t go to SBCL10, so I didn’t know what to expect. There were few speakers announced, and I worried (to myself) that this was a sign of insufficient participation, but I couldn’t have been more wrong.

Philipp Marek played host at his employer, BRZ. We had a large room laid out with several tables, ample power, notepads, markers, and treats, fresh food and drink, and a stage area with chairs set out for an audience.

The conference lead off with a BRZ representative who explained BRZ’s purpose with an English-language video and encouraged qualified people to consider joining the company.

Christophe Rhodes acted as master of ceremonies. He put the conference in historical context and explained the format. While there were only a small number of planned talks, the remainder of the two-day conference was intended for brainstorming, reminiscing, hacking, and impromptu talks by anyone with an interesting idea, demo, or controversial proclamation. He challenged all of us to pick something to work on and present it by the end of the conference.

With everyone in the same room, it was nice to see people help each other in unexpected ways. People describing a problem would get interrupted by people who could help. And not “Have you tried installing My Favorite Linux?”-type unhelpful speculation - it was always valuable in one form or another. The breadth and depth of SBCL and Common Lisp knowledge meant that, for example, someone could explain why a certain decision was made in CMUCL 30 years ago, why SBCL went in a different direction 20 years ago, and how things could be improved today.

I chose to revisit a problem I first wrote about five years ago. It boils down to this: can a third party (Quicklisp, in this case) control the dynamic environment of each load-system operation when recursively loading a system? The SBCL connection is a little loose, but it relates to how SBCL’s compiile-time analysis warnings should be shown prominently for local development, even when loading systems with ql:quickload.

A lot of people pitched in with ideas and suggestions. Mark Evanson in particular acted as a sounding board in a way that helped me get to the essence of the problem. Unfortunately, I wasn’t able to find a solution before the end of the conference.

Robert Smith and Lisp at Rigetti

Robert gave an introduction to the math behind quantum computing, and then described the success Rigetti has had with SBCL and Common Lisp for building its quantum computing simulator and compiler. CL allows exploring and adopting new ideas at a very quick pace. I liked that it wasn’t high-level “Lisp is great!” preaching to the choir, but gave some specific practical examples of how CL features made life easier.

Charlie Zhang and RISC-V

Charlie is a newcomer to SBCL development as of 2019. In the span of a few months added a new compiler backend for fun. He talked about the challenges of understanding and extending SBCL, and some of his future plans.

He generated backend the assembly with the Lisp assembler - it gave full advantages of Lisp for code manipulation at the assembly level. First and only backend to do so!

Funniest fact I overheard at dinner after the workshop: “I don’t even use Common Lisp for anything, I just hack on the compiler!”

Doug Katzman and SBCL as a Unix program

Doug Katzman talked about his recent work at Google aimed at making SBCL play well with Unix tools. Specifically debugging and profiling tools that expect a binary to act in a certain way and to provide debugging and stack info in a specific format. This makes it easier to provide insight into how SBCL is spending its time when there are many, many instances running across many, many computers. And with so many instances running, the benefits of improving performance add up in significant ways.

This use case is far removed from my typical live, ongoing, interactive development session with SBCL. It was interesting to learn about the constraints and requirements of Lisp servers in that kind of environment.

Other talks

There were a dozen or more short presentations and talks about various topics. Luís talked about porting an application to SBCL from another Lisp and some of the good and bad parts along the way. Charlie talked about contification optimization for SBCL. There were quick talks about tuning the GC, SIMD extensions, missing features of SBCL, Coalton, CL support in SWIG, and several more.

Bits and pieces

  • There are more commits to SBCL now than ever before, despite there being fewer people contributing overall
  • It would nice to have support for good CL interaction in popular new editors, but some of the existing components (e.g. LSP) are insufficient to support SLIME-like levels of interaction
  • Siscog has more than 60 people who work on Lisp programs - biggest collection in the world?

Closing wishlist items

Christophe solicited wishlist items to close out the conference. Here are a few that I jotted down:

  • Debug-oriented interpeter, e.g. with macroexpand-always behavior (the current interpreters do not especially improve the debug experience)
  • Profile-based compilation feedback (sort of like tracing jit but not exactly?)
  • Class sealing
  • Full configurable numeric tower
  • Arbitrary-precision floats
  • Concurrent GC - GC pauses cited by more than one person as a pain point

Overall

I had a great time at SBCL20 and it renewed my energy for working on Common Lisp projects. It’s great to see in person the people you are helping when you create or contribute to a community project.

Thanks to everyone who attended and spoke, to Philipp Marek for hosting, and to Christophe Rhodes for acting as MC and much more.

It’s too long to wait for SBCL30. How about an SBCL25, or even sooner?

See you there!

Exclusive file open

The Planet Lisp twitter fix involved tracking post status with a file. Although it’s not 100% bulletproof, there’s a trick when using files to reduce races.

Here’s a naive approach:

  1. Assign a Planet Lisp post a unique pathname
  2. If that pathname exists, do nothing
  3. Otherwise, send a tweet for the post and save its details to the pathname

The problem is between 2 and 3 - another process could post the tweet first.

There’s another option:

  1. Assign a Planet Lisp post a unique pathname
  2. Attempt to open the file with O_EXCL|O_CREAT mode
  3. If the attempt succeeds, send a tweet for the post and save its details to the pathname

The key bit is O_EXCL|O_CREAT mode, a Unix option that means open will fail if the file exists, and succeed atomically otherwise. Specifically:

The check for the existence of the file and the creation of the file if it does not exist shall be atomic with respect to other threads executing open() naming the same filename in the same directory with O_EXCL and O_CREAT set.

In SBCL, you can getO_EXCL|O_CREAT semantics by specifing :direction :output and NIL, :error, or :new-version as the :if-exists argument to CL:OPEN. For NIL, you can check to see if the returned value is NIL, instead of a stream, to know if you have successfully exclusively opened the file and can proceed. I used the NIL approach in the Planet Lisp twitter gateway.

Working with letsencrypt’s certbot for a Lisp webserver

Every 90 days my letsencrypt certificate expires and I renew it manually. I have to cut and paste data from the certbot script into repl forms to serve the right data on the right path and it’s such a pain that sometimes I put it off until after it expires, and people email me about it, and I feel bad.

The manual process looks something like this (not my actual domain or challenges):

# certbot certonly --manual -d my.example.com

Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator manual, Installer None
Cert is due for renewal, auto-renewing...
Renewing an existing certificate
Performing the following challenges:
http-01 challenge for my.example.com

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
NOTE: The IP of this machine will be publicly logged as having requested this
certificate. If you're running certbot in manual mode on a machine that is not
your server, please ensure you're okay with that.

Are you OK with your IP being logged?
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(Y)es/(N)o: y

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Create a file containing just this data:

uaut.gj61B3f7oYQcWZSF4kxS4OFh8KQlsDVtPXrw60Tkj2JLW7RtZaVE0MIWwiEKRlxph7SaLwp1ETFjaGDUKN

And make it available on your web server at this URL:

http://my.example.com/.well-known/acme-challenge/SpZa7sf4QMEFoF7lKh7aT4EZjNWSVcur2jODPQkgExa

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Press Enter to Continue

There’s no way that letsencrypt’s certbot program will automatically get support for my custom hunchentoot “framework” and I don’t really want to look into adding it myself.

For a while I thought about writing some Expect functionality in SBCL - sb-ext:run-program already supports a :pty option so it wouldn’t be super-difficult. With a theoretical cl-expect you could spawn certbot with the right options, slurp out the verification secret and URL via the process’s standard output stream, and call whatever code you need to serve the secret at that location.

I realized today there’s a halfway solution that takes the worst cutting and pasting out of the loop. The unix command script –flush starts an interactive session where all input and output is saved to a file. My Lisp webserver can then read certbot program output from that file and configure the webserver automagically for me. I still have to manually start certbot and enter a few REPL commands, but it’s easier than before.

Here’s the core of Lisp side of things for processing the script file:

(defun scrape-letsencrypt-interaction (file)
  (let (secret path)
    (with-open-file (stream file)
      (labels (...)
        (loop
          (skip-past "Create a file containing")
          (next-line)
          (setf secret (read-trimmed-line))
          (skip-past "make it available")
          (next-line)
          (setf path (substring-after ".well-known" (read-trimmed-line))))))))

This could be done just as well (perhaps with cl-ppcre, but I didn’t want to pull it in as dependency.

Here are the functions from labels:

((next-line ()
   (let ((line (read-line stream nil)))
     (when (null line)
       (unless (and secret path)
     (error "Didn't find a secret and path anywhere in ~S"
            file))
       (return-from scrape-letsencrypt-interaction
         (values secret path)))
     line))
 (skip-past (string)
   (loop
     (let ((line (next-line)))
       (when (search string line)
         (return)))))
 (read-trimmed-line ()
   (string-trim '(#\Return)
                (next-line)))
 (substring-after (string target)
   (let ((pos (search string target)))
     (unless pos
       (error "Could not find ~S in ~S" string target))
     (subseq target pos))))

The goal here is to only look at the last secret and URL in the script output, so the main loop keeps track of what it’s seen so far and next-line returns those values at EOF. The output also has ugly trailing ^M noise so read-trimmed-line takes care of that.

Here’s the whole thing all together:

(defun scrape-letsencrypt-interaction (file)
  (let (secret path)
    (with-open-file (stream file)
      (labels ((next-line ()
                 (let ((line (read-line stream nil)))
                   (when (null line)
                     (unless (and secret path)
                       (error "Didn't find a secret and path anywhere in ~S"
                              file))
                     (return-from scrape-letsencrypt-interaction
                       (values secret path)))
                   line))
               (skip-past (string)
                 (loop
                   (let ((line (next-line)))
                     (when (search string line)
                       (return)))))
               (read-trimmed-line ()
                 (string-trim '(#\Return)
                              (next-line)))
               (substring-after (string target)
                 (let ((pos (search string target)))
                   (unless pos
                     (error "Could not find ~S in ~S" string target))
                   (subseq target pos))))
        (loop
          (skip-past "Create a file containing")
          (next-line)
          (setf secret (read-trimmed-line))
          (skip-past "make it available")
          (next-line)
          (setf path (substring-after ".well-known" (read-trimmed-line))))))))

I don’t mind shaving only half a yak when it feels like useful progress!

Someday I’ll get around to a proper Expect-like thing…