This is a fun snapshot of the state of the art in the early 1980s, when it was still possible for a few people to build a competitive computer almost from scratch on a lab bench. (It’s amusingly coy about SKIM using a BBC Micro as its front-end processor.)
SKIM became obsolete almost immediately, of course, not just because general-purpose computers were getting much faster very quickly, but also or maybe moreso because the compilation techniques for non-strict languages were improving rapidly.
Stoye talks about directors, which are generalized combinators, and briefly mentions supercombinators. Supercombinators go together with lambda lifting, which re-arranges the program to turn free variables into arguments and make closure construction more explicit. The lifted lambda expressions are compiled as supercombinators, specialized for this particular program. Dunno how many supercombinators could fit in SKIM’s microcode program memory!
Then come improvements to graph reduction, such as the G machine and later the spineless tagless G machine.
The other big change that happened in functional programming was the introduction of monads in about 1990. Monads immediately obsoleted the lazy-list-style IO described by Stoye, because it was frankly a pain in the arse, difficult to avoid deadlocks due to evaluating a response before producing its prompt. As well as changing the surface programming model, monads changed the way IO was compiled. It’s described very nicely by Wadler and Peyton Jones in Imperative Functional Programming.
When I was playing around with SKI reduction, I used a simple monadic IO scheme that explicitly passed around a token representing the state of the world. My compiler was untyped so I had to be careful not to duplicate the world by accident, but that wasn’t too hard. The IO primitives were really easy to implement, just like a normal getchar or putchar but taking and returning the world as an extra value. Lazy list style IO seems to need a lot more help from the runtime than that!
How do the G machines relate to GRIP? I thought G was just a shortening, but it seems Jones was very productive!
Tangential, but since modern CPUs present themselves to C’s assumptions, how much processing power are multiple mismatch penalties costing us (compared to a freer ISA, offering optimizations C can’t utilize)? (I e.g. read about Conal Elliott “bypassing the C layer” to achieve C speeds: http://conal.net/ but anything further is bottlenecked by ISA design.)
Dunno what GRIP is … searchengineering … ah, seems like it was a late 1980s “graph reduction in parallel” multiprocessor 68020 machine with special purpose bells and whistles. There are a few papers about it listed in this Glasgow Parallel Haskell bibliography
The G machine (and its successors) is a design for a runtime system for lazy functional programming languages on normal computers. It deals with things like heap layout, thunks, closures, GC, etc. usw. I suppose the G stood for graph and/or Glasgow.
Note that SLPJ’s surname is Peyton Jones, double-barrelled but unhyphenated.
I should learn more about that cartesian closed category stuff. I looked at some of Conal Elliott’s slides and the lambda elimination looks a lot like compiling to combinators — but I was disappointed that he didn’t compare and contrast the two, which might have helped me get a grip more easily.
Sticking my neck out, I think the lazy execution model is fundamentally hard to run fast, because it’s based on a graph of pointers with lots of aliasing and mutation. But if you weaken laziness to some more general notion of non-strict evaluation, and use purity to enable heroic compiler transformations (like Conal Elliott’s) then fun things can happen.
I should probably make a page somewhere about the category/combinator correspondence. Categories are like working in the BI basis; more usefully, categories with finite products are like the BCI basis. For any simply-typeable expression in SK, BCKW, or other complete bases, there is a corresponding expression in a Cartesian-closed category.
The main difference is in data flow. A combinator doesn’t say how data flows through it, but rather how code is composed. By contrast, a category’s arrow has precisely the type information required to explain what kinds of data it accepts as input and emits as output, at the cost of no longer having intensional structure due to isomorphism invariance.
Note that most of the Cartesian-closed ground was explored in the 80s already and is reflected in the history of the CAM. The popularity of Haskell is directly connected to the fact that Lazy ML had a much cheaper list-reversal operation in bytecode than CAML, which in turn is directly connected to the evaluation-order requirements imposed by category theory. Since any such effort to add Turing-completeness either breaks products for eager languages or breaks sums for lazy languages at best, people chose the cheaper breakage. This is also why my Cammy language is pure and total; I’d rather give up Turing-completeness than proper sums or products.
A few years later, the MUSHROOM project at Manchester did custom hardware for Smalltalk. FPGAs then were just powerful enough to build interesting prototypes (as I recall, MUSHROOM used seven discrete FPGAs). It’s depressing that all of this coincided with mainstream computers agreeing on the least interesting abstract machines, though completely understandable: when performance was doubling every year or so, you could build the VM you wanted in software and it would be fast enough for everything in a couple of hardware generations.
There was also a roughly contemporaneous novel architecture called ALICE (Applicative Language Imperial College Executor or something similar) which ran a dataflow language possibly called HOPE.
The details are probably buried in a library archive somewhere: I had a demo floppy from them but have heard nothing of it since.
There was a predecessor of ML and Miranda from Edinburgh called Hope which is notable because it was one of the earliest languages with algebraic data types, and function definition using equational pattern matching. Functional rather than dataflow.
This is a fun snapshot of the state of the art in the early 1980s, when it was still possible for a few people to build a competitive computer almost from scratch on a lab bench. (It’s amusingly coy about SKIM using a BBC Micro as its front-end processor.)
SKIM became obsolete almost immediately, of course, not just because general-purpose computers were getting much faster very quickly, but also or maybe moreso because the compilation techniques for non-strict languages were improving rapidly.
Stoye talks about directors, which are generalized combinators, and briefly mentions supercombinators. Supercombinators go together with lambda lifting, which re-arranges the program to turn free variables into arguments and make closure construction more explicit. The lifted lambda expressions are compiled as supercombinators, specialized for this particular program. Dunno how many supercombinators could fit in SKIM’s microcode program memory!
Then come improvements to graph reduction, such as the G machine and later the spineless tagless G machine.
Much of this and more is covered in The Implementation of Functional Programming Languages edited by Simon Peyton Jones and published just a couple of years after Stoye’s thesis.
The other big change that happened in functional programming was the introduction of monads in about 1990. Monads immediately obsoleted the lazy-list-style IO described by Stoye, because it was frankly a pain in the arse, difficult to avoid deadlocks due to evaluating a response before producing its prompt. As well as changing the surface programming model, monads changed the way IO was compiled. It’s described very nicely by Wadler and Peyton Jones in Imperative Functional Programming.
When I was playing around with SKI reduction, I used a simple monadic IO scheme that explicitly passed around a token representing the state of the world. My compiler was untyped so I had to be careful not to duplicate the world by accident, but that wasn’t too hard. The IO primitives were really easy to implement, just like a normal
getchar
orputchar
but taking and returning the world as an extra value. Lazy list style IO seems to need a lot more help from the runtime than that!How do the G machines relate to GRIP? I thought G was just a shortening, but it seems Jones was very productive!
Tangential, but since modern CPUs present themselves to C’s assumptions, how much processing power are multiple mismatch penalties costing us (compared to a freer ISA, offering optimizations C can’t utilize)? (I e.g. read about Conal Elliott “bypassing the C layer” to achieve C speeds: http://conal.net/ but anything further is bottlenecked by ISA design.)
Dunno what GRIP is … searchengineering … ah, seems like it was a late 1980s “graph reduction in parallel” multiprocessor 68020 machine with special purpose bells and whistles. There are a few papers about it listed in this Glasgow Parallel Haskell bibliography
The G machine (and its successors) is a design for a runtime system for lazy functional programming languages on normal computers. It deals with things like heap layout, thunks, closures, GC, etc. usw. I suppose the G stood for graph and/or Glasgow.
Note that SLPJ’s surname is Peyton Jones, double-barrelled but unhyphenated.
I should learn more about that cartesian closed category stuff. I looked at some of Conal Elliott’s slides and the lambda elimination looks a lot like compiling to combinators — but I was disappointed that he didn’t compare and contrast the two, which might have helped me get a grip more easily.
Sticking my neck out, I think the lazy execution model is fundamentally hard to run fast, because it’s based on a graph of pointers with lots of aliasing and mutation. But if you weaken laziness to some more general notion of non-strict evaluation, and use purity to enable heroic compiler transformations (like Conal Elliott’s) then fun things can happen.
I should probably make a page somewhere about the category/combinator correspondence. Categories are like working in the BI basis; more usefully, categories with finite products are like the BCI basis. For any simply-typeable expression in SK, BCKW, or other complete bases, there is a corresponding expression in a Cartesian-closed category.
The main difference is in data flow. A combinator doesn’t say how data flows through it, but rather how code is composed. By contrast, a category’s arrow has precisely the type information required to explain what kinds of data it accepts as input and emits as output, at the cost of no longer having intensional structure due to isomorphism invariance.
Note that most of the Cartesian-closed ground was explored in the 80s already and is reflected in the history of the CAM. The popularity of Haskell is directly connected to the fact that Lazy ML had a much cheaper list-reversal operation in bytecode than CAML, which in turn is directly connected to the evaluation-order requirements imposed by category theory. Since any such effort to add Turing-completeness either breaks products for eager languages or breaks sums for lazy languages at best, people chose the cheaper breakage. This is also why my Cammy language is pure and total; I’d rather give up Turing-completeness than proper sums or products.
A few years later, the MUSHROOM project at Manchester did custom hardware for Smalltalk. FPGAs then were just powerful enough to build interesting prototypes (as I recall, MUSHROOM used seven discrete FPGAs). It’s depressing that all of this coincided with mainstream computers agreeing on the least interesting abstract machines, though completely understandable: when performance was doubling every year or so, you could build the VM you wanted in software and it would be fast enough for everything in a couple of hardware generations.
There was also a roughly contemporaneous novel architecture called ALICE (Applicative Language Imperial College Executor or something similar) which ran a dataflow language possibly called HOPE.
The details are probably buried in a library archive somewhere: I had a demo floppy from them but have heard nothing of it since.
There was a predecessor of ML and Miranda from Edinburgh called Hope which is notable because it was one of the earliest languages with algebraic data types, and function definition using equational pattern matching. Functional rather than dataflow.
Yes, I’ve come across the Edinburgh one before when doing a once-in-a-decade-or so search for the one I remember :-)