1. 23

I’ve picked up my Tails project again — a little Forth-like language using a fast tail-calling threaded interpreter. It’s statically-typed, but I’ve run into the limits of its very simple type system and I want to fix that*.

I’d like to read more about type systems, type checkers, type inference … but everything I’ve seen on the topic is quite technical and leaves me lost in a fog of lambda calculus, category theory and advanced FP techniques.

Is there anything that explains these topics from more of an implementor’s perspective, preferably without assuming the reader is skilled in OCaml or Haskell, and preferably using [pseudo]code instead of math notation? Or do I need to suck it up and study texts like SICP and Learn You A Haskell first?

FWIW, I’m pretty familiar with parsers and bytecode interpreters, having implemented several over the years, but for dynamically typed languages. Tails is my first attempt at static typing.

* the problems I’m working on now have to do with type-checking calls to lambdas (“quotes”). I need to extend the type system to represent the signature of a lambda parameter within a function signature, and to do this “row polymorphism” thing that’s unique to concatenative languages.

    1. 9

      Do you have generics? If you don’t (Java 5), then I think “just go and write naive type-checker” works. Basically, to compute type of expression, you compute types of subexpressions, and then figure the type of the whole. The code shape here is similar to evaluation.

      If you have generics (Java 6), you probably need unification, and that’s where things get trickier, because your type inference is now a two-pass algorithm — a pass over AST gives you a set of equations, and then a separate algorithm solves the equations. I don’t know a good intro here, I would probably look into https://plzoo.andrej.com/language/miniml.html.

      1. 3

        Since your language has quotations (lightweight syntax for functions -> pretty much lambda) I think you do need more than a simple bottom-up checker.

        For example, if you can write the equivalent of items.map(x => x + 1), then you want to infer the type of the parameter x, based on the type of elements that are in items. One way to look at it is that x => x + 1 has a generic type: function from T to T.

        1. 3

          I need […] to do this “row polymorphism” thing that’s unique to concatenative languages.

          I feel like I’ve seen a talk about this, probably for Cat. Aha it’s mentioned on the README: https://github.com/cdiggins/cat-language#the-syntax-of-types). (EDIT: This was the talk I was thinking of, for a different language called Kitten: https://www.youtube.com/watch?v=_IgqJr8jG8M.)

          I think the important part is:

          1. to recognize that functions take the whole stack as their input and output, and
          2. to think of the stack as being made of nested pairs.

          For example a stack with an int on top, a string underneath that, and the rest unknown would have type pair<int, pair<string, S>>, where S is an unknown. The nesting lets you leave “the rest of the stack” unknown just like any other unknown type.

          dup has a generic type: function<pair<X,S>, pair<X,pair<X,S>>>. X is the unknown type of the element being duplicated, and S the unknown type of the entire rest of the stack. But even though S is unknown, it’s the same S in the input and output, which is how the type checker knows that dup preserves all the stack-elements besides the duplicated one.

      2. 1

        I guess in the context of stack-based languages, looking at WASM spec might be useful:

        https://webassembly.github.io/spec/core/valid/instructions.html

        In particular, I think call-to-lambda would roughly correspond to call_indirect?

        https://webassembly.github.io/spec/core/valid/instructions.html#xref-syntax-instructions-syntax-instr-control-mathsf-call-indirect-x-y

        and https://binji.github.io/posts/webassembly-type-checking/ seems a reasonable prose description of the spec.

    2. 6

      I’ve spent a considerable amount of time on static typing of concatenative languages. It’s very nontrivial to properly type polymorphic quotes. Standard Hindley Milner type inference isn’t expressive enough. See my comment in the third paragraph here: https://www.dawn-lang.org/posts/dawn-status-update-(october-2021)/

      If you’re really interested in going down the rabbit hole I can share some references and code with you. That project is deep on a back burner for me, since it never proved to be as easy to read or write as I hoped. Applicative syntax still appears to be superior, even for linear languages, especially if you extend them with implicit context parameter passing, i.e. form a hybrid of applicative and concatenative syntax.

      1. 3

        The insight I had when I wrote the first checker in Tails is that type/stack checking is like running the program itself, except

        • The stack items are sets of types instead of values.
        • Instead of running a word you just verify that its stack effect matches the current stack (both its depth and the types on it), then update the stack applying the effect.
        • On a ZBRANCH you follow both paths: taking the branch and skipping it. I use a recursive call to the checker.
        • When two flows of control meet up, their stacks have to match.

        If you already know the word’s stack effect, you initialize the stack with its inputs and verify that the final stack matches the outputs.

        If you don’t know its stack effect, then when something above tries to read off the bottom of the stack you extend the stack, adding inputs accordingly. And whatever’s left at the end is the output.

        Of course it’s getting more complicated now that I have quotes, whose type has its own nested stack effect.

        1. 3

          Yeah, sounds a bit like a C/C++ style type checker, which Zig extends with arbitrary type-level code execution. That approach definitely has some advantages, namely simplicity. But expressivity is limited, so languages taking that approach end up needing to use escape hatches sometimes, and end up being unsound.

          Verse appears to be taking the Zig idea a bit further, closer to full dependent typing, but using unary predicate functions as types, and then using Z3 or similar to find counter-examples.

          I had been focused exclusively on ML-style type systems, which are what Haskell and Rust use.

      2. 2

        I thought Dawn was really interesting, I hope you will feel inspired to continue work on it some time. :)

    3. 5

      For what it’s worth, type checking is very similar to implementing an interpreter for a dynamically typed language. The key is keeping a type environment, very similar to the state environment of an interpreter, where you add to this environment as you encounter interesting terms.

      For example, say you have this program fragment:

      let x = 5
      let y = x + 1
      

      When you encounter the first let statement, you’ll do something like:

      type_env.insert("x", Type::Int)
      

      Then when you encounter the next statement, you first have to traverse down the right side of the let expression and determine its type. When you get to the variable access of “x”, you can look it up in the type env:

      type_env.get("g")
      

      I’ve since comically switched to OCaml for my language implementation, but I had an earlier prototype that uses Rust if you want to take a look at some (super unideal) real implementation code: https://github.com/amw-zero/sligh_retired/blob/main/src/main.rs#L2343

      1. 2

        I (also – seems common) wrote a blog post a while back on this very topic. A very simple implementation of static types: http://sigusr2.net/metacircular-static-types.html

      2. 1

        type checking is very similar to implementing an interpreter for a dynamically typed language

        I had this same feeling the first time I got a working-ish type checker for my partially typed toy language.

        1. 3

          That’s exactly what it is. Type checking can be seen as abstract interpretation where you calculate types.

      3. 1

        That sounds basically like what I’m doing, only the implementation is entirely different because it’s a concatenative language; so the type environment is like the runtime stack, only it contains types instead of data values.

    4. 4

      Could you give us a sample of what you mean by “everything I’ve seen on the topic … [being] quite technical”?

      Have you - for example - looked at Types and Programming Langauges? I personally find it quite readable (and I have also been subject to many incomprehensible math/cs textbooks in the past), but I don’t know whether our definitions of “readable” align. I’m at least with you on getting lost in category theory and advanced FP techniques, but I do feel some familiarity with the lambda calculus.

      1. 2

        In my experience, this sort of ask (“less-technical introduction to type checking”) usually means “I want an alternative to Types and Programming Languages”. Just like you, I also found it readable, but somehow it just doesn’t seem to work for quite a lot of people.

        As I wrote elsewhere, I had good results recommending Essentials of Programming Languages as an alternative.

      2. 2

        It is a readable book but it doesn’t cover the material you’d want to write a production type checker IMO. It’s similar to how most people do not read about regular languages CFGs in the dragon book and then go implement a parser.

        I read the first 10 chapters and it was worthwhile to remind me of the math. But even then I think it could have done a better job relating the math and the algorithm, ie generation vs recognition. That part seemed to be hand waved away or left as an exercise for the reader. The implementations could be very inefficient IIRC, ie some with quadratic algorithms? I don’t think they focused enough on the algorithms

        The later chapters seemed to diverge even further from common languages, ie solving difficult problems with type inference when I’d want something simpler

        I would be VERY interested if someone can show me a type checker they wrote after reading TAPL

    5. 3

      This does have OCaml and math notation, but I think the explanations are way more accessible than other resources: An accessible introduction to type theory and implementing a type-checker.

    6. 3

      My go-to recommendation for this has been Essentials of Programming Languages. It is a programming language textbook. Its implementation language is Scheme. It includes a full, straightforward (“textbook”) implementation of type checking and type inference for a simple imperative language and extend it to a simple object-oriented language. It does unification and also supports subtyping, so it’s not trivial.

    7. 2

      Articles by Luca Cardelli. [1] has Pascal code implementing a simple type system for a ML-like language. He also wrote about polymorphism. Very approachable. The chapter on Types in the PLAI book [2] if you can read LISP.

    8. 2

      I wrote something on this a while back: https://reasoning.page/blog/hindley-milner-rust

    9. 2

      I have had the same struggle and continue to have a difficult time when I want to research a new aspect of type systems. Most of the material on the subject uses inscrutable, complex math notation and it’s hard to find anything that builds from first principles with simple code examples. My focus has been on type inference. I found this (unfortunately incomplete) blog series called Type inference for beginners to be very helpful in understanding the basics. I wouldn’t buy the book just for this, but Software Design for Flexibility touches upon type inference in the pattern matching chapter. That book is kind of a successor to SICP, and though I highly recommend SICP as the best CS book ever made, you won’t find much to help you with type systems in it. Making the connection that Hindley-Milner type inference is an application of the more general concept of pattern matching helped me get my head on straight. I feel like I understand the basic algorithm pretty well now, and I’ve been able to extend the system to handle polymorphism but I don’t have a good resource to recommend for that part as I just pieced stuff together from crumbs of various papers. If I can produce a working version of my project (a Scheme-like to GLSL shader compiler) I’d like to write a blog post called “Type inference for lispers” or something.

    10. 2

      Peter Sestoft’s Programming Language Concepts[0] would be a great first book before TAPL. It’s got implementations of a micro SML (called Micro-SML) and a micro C (also called Micro-C). The code is in F#. Has a succinct intro to F# as an appendix. Strongly recommend it.

      [0] - https://www.itu.dk/people/sestoft/plc/

    11. 1

      One specific point I think is worth considering up front if you choose to not invest in learning much of the theory is to at least get a rough understanding of soundness and completeness and when/why you should care. Despite some purists assertions, there are some valid scenarios for violating either or both… but you should do so with the awareness that you are making that choice.

      I’ve seen several people try to “hack their way” to a type system from scratch and starting without one of those properties and then trying to add it later can be a big hill to climb.

    12. 1

      Maybe The Little Typer?

      https://thelittletyper.com/

      1. 2

        This book is about dependent typed systems, not a good fit here I think.

    13. 1

      A crazy idea: you seem to be familiar with C++ and it has generic lambdas, which must be quite thoroughly specified in the standard but without any math notation (though it could be specified in terms of another mechanism, most likely template type deduction, so there might be a few levels to this rabbit hole). Just a thought.