Recursive types for free!
Philip Wadler
University of Glasgow
July 1990 [some typos fixed Aug 2008, Oct 2014]
DRAFT DRAFT DRAFT DRAFT DRAFT
Recursive types pervade programming: lists, trees, and streams being
three of the most common examples. Recursive types come in two
principle flavours, least fixpoint or greatest fixpoint. For example,
the least fixpoint
Lfix X. 1 + A*X
yields lists with elements of type A, the least fixpoint
Lfix X. A + X*X
yields binary trees with leaves of type A, and the greatest fixpoint
Gfix X. A*X
yields streams with elements of type A.
Adding recursive types can alter the nature of a type system. The
polymorphic lambda calculus has the pleasant propery of being strongly
normalising: all reduction sequences terminate in a normal form. But
augmenting this calculus with the type
Lfix X. 1 + (X -> X)
has the unpleasant consequence of introducing terms with no normal
form. Fortunately, strong normalisation can be preserved by a mild
restriction: don't allow the recursive type variable to appear in a
negative position. The example violates this constraint, because the
recursive type variable X appears to the left of the function arrow.
Thus, it is safe to extend the polymorphic lambda calculus by adding
least fixpoint types with type variables in positive position. Indeed,
no extension is required: such types already exist in the language! If
F X represents a type containing X in positive position only, then least
fixpoints may be defined in terms of universal quantification:
Lfix X. F X = All X. (F X -> X) -> X.
This introduces a new type, T = Lfix X. F X, satisfying the isomorphism
T ~ F T. Note that it is an isomorphism, not an equality: the type
comes equipped with functions in : T -> F T and out : F T -> T. This
formula can be found, for instance, in [Freyd 89] and [Wraith 89]. It
is not as widely known as it should be -- I know of several computer
scientists who have re-invented this particular wheel. The excellent
textbook by Girard, Lafont, and Taylor [GLT 89] gives several special
cases, but not this general form.
More interestingly, polymorphic lambda calculus also contains greatest
fixpoints. These may be defined in terms of existential quantification
Gfix X. F X = Exists X. (X -> F X) * X,
again subject to the restriction that X appears only positively in F X.
This is a little surprising: greatest fixpoints allow infinite objects,
such as streams, yet the strong normalisation property is preserved.
Surprising, but not new: it was known previously that greatest
fixpoints and strong normalisation could co-exist: both [Hagino 87] and
[Mendler 87] describe type systems that include Lfix and Gfix and have
strong normalisation. But the encoding provides a simple proof of this
fact. Hagino refers to the coding of least fixpoints, but says that he
doesn't know of a coding for greatest fixpoints; Mendler doesn't refer
to either coding, but gives a lengthy proof that adding least and
greatest fixpoints to polymorphic lambda calculus preserves strong
normalisation.
(As it turns out, not new either: I have since discovered that Wraith
also describes this encoding [Wraith 89], although there is a small
technical error: he writes
Gfix X. F X = Exists X. X -> (X -> F X),
which is incorrect.)
LEAST FIXPOINTS AS WEAK INITIAL ALGEBRAS
Let's now look at the fixpoint result in a little more detail.
This will require a mild dose of category theory. Don't panic:
all terms will be explained as we go along.
By a functor F in polymorphic lambda calculus, we will mean an operation
taking types into types, and terms into terms, such that if t : U -> V
then F t : F U -> F V, and preserving identities and composition:
F id = id and F (f . g) = F f . F g. Every type U containing a type
variable X in positive positions only corresponds to a functor F X = U,
which takes the type T into the the type F T = U[T/X].
An object in a category is weakly initial if there is a map from it to
every other object, and initial if this map is unique. In categorical
terms, the least fixpoint of F corresponds to an initial F-algebra. An
F-algebra is a pair (X,k) consisting of an object X and an arrow
k : F X -> X. These form a category, where a morphism between (X,k) and
(X',k') is given by an arrow h : X -> X' such that the diagram
k
F X ----------> X
| |
| |
(1) F h | | h
| |
| |
F X' ---------> X'
k'
commutes.
Assume Lfix is given by the equation:
T = Lfix X. F X = All X. (F X -> X) -> X.
As a convenient abbreviation, we will write T for Lfix X. F X.
Then we can define two functions:
fold : All X. (F X -> X) -> T -> X
fold = \X. \k: F X -> X. \t:T. t X k
in : F T -> T
in = \s: F T. \X. \k: F X -> X. k (F (fold X k) s).
It follows immediately that the diagram
in
F T ----------> T
| |
| |
(2) F (fold X k) | | fold X k
| |
| |
F X ----------> X
k
commutes. In other words, (T,in) is an F-algebra from which
there is a map, called (fold X k), to every other F-algebra;
that is, (T,in) is a weakly initial F-algebra.
Let's consider what this means in a particular case. Take
F X = 1+X.
The values of type 1+X have the forms (inl ()) and (inr x), where x:X.
If f : X -> Y, then F f : 1+X -> 1+Y is given by
F f (inl ()) = inl ()
F f (inr x) = inr (f x).
Now, define Nat to be the least fixpoint of F:
Nat = Lfix X. 1+X.
This corresponds to the natural numbers: (in (inl ())) represents zero,
and (in (inr n)) represents the successor of n. Let k : 1+X -> X.
Then diagram (2) states that
fold X k (inl ()) = k (inl ())
fold X k (inr n) = k (inr (fold X k n)).
If we take z = k (inl ()), and s x = k (inr x)), then we
can rewrite this in the familiar form:
fold X k 0 = z
fold X k (n+1) = s (fold X k n).
That is, the value of (fold X k) is given for zero (namely, z), and is
found for (n+1) by recursively applying (fold X k) to n, and then
applying a given function (namely, s) to this result.
As a second example, take
F X = 1 + A*X.
The values of type 1+A*X have the forms (inl ()) and (inr (a,x)),
where a:A and x:X. If f : X -> Y, then F f : 1+A*X -> 1+A*Y is
given by
F f (inl ()) = inl ()
F f (inr (a,x)) = inr (a, f x).
Now, define (List A) to be the least fixpoint of F:
List A = Lfix X. 1+A*X.
This corresponds to lists with elements of type A: (in (inl ())
represents the empty list, also written nil, and (in (inr (a,as))
represents the list constructed with head a and tail as, also written
(cons a as). Let k : 1+A*X -> X. Then diagram (2) states that
fold X k (inl ()) = k (inl ())
fold X k (inr (a,as)) = k (inr (a, fold X k as)).
If we take n = k (inl ()) and c a x = k (inr (a,x)), then
we can rewrite this in the familiar form
fold X k nil = n
fold X k (cons a as) = c a (fold (X k as)).
That is, the value of (fold X k) is given for nil (namely, n), and is
found for (cons a as) by recursively applying (fold X k) to as, and
then using a given function (namely, c) to combine a with the
result.
LEAST FIXPOINTS AS INITIAL ALGEBRAS
So far, we have considered only weak initial algebras. We now give
necessary and sufficient conditions for this to be a true initial
algebra. We first consider the problem for an arbitrary definitions of
T, fold, in, and then specialise to the particular definitions we have
given.
In order for (T,in) to be initial, the map (fold X k) must be
unique, i.e., the only map that makes diagram (2) commute.
Let h be an arbitrary map from (X,k) to (X',k'); then
combining (1) and (2) we have
in
F T ----------> T
| |
| |
F (fold X k) | | fold X k
| |
| k |
F X ----------> X
| |
| |
F h | | h
| |
| |
F X' ---------> X' ,
k'
yielding a map (h . fold X k) from (T,in) to (X',k'). But (fold X' k')
should be the only such map! Thus, initiality entails that
k fold X k
F X ----------> X T -----------> X
| | | |
| | | |
(3) F h | | h implies id | | h
| | | |
| | | |
F X' ---------> X' T -----------> X' .
k' fold X' k'
Further, since (fold T in) is a map T -> T, and id is also such
a map, initiality also implies that
(4) fold T in = id.
Conversely, (3) and (4) imply than (T,in) is initial.
Choosing an appropriate instance of (3) gives
in fold T in
F T ----------> T T -----------> T
| | | |
| | | |
F h | | h implies id | | h
| | | |
| | | |
F X ----------> X T -----------> X .
k fold X k
The left-hand square states that h is a map from (T,in) into (X,k);
and the right-hand square, combined with (4), states that h must
be (fold X k). That is, (fold X k) is the only map from (T,in)
into (X,k), as required.
If (T,in) is an initial F-algebra, then "in" is an isomorphism.
By functoriality, from in : F T -> T we have (F in) : F (F T) -> F T,
hence (F T, F in) is an F-algebra. The unique map from (F,in) into this
algebra is given by
out : T -> F T
out = fold (F T) (F in).
To see that "in" and "out" are inverses, stare at the following diagram:
in
F T ----------> T
| |
| |
F out | | out
| |
| F in |
F (F T) --------> F T
| |
| |
F in | | in
| |
| |
F T ----------> T .
in
The top square is an instance of (2), and the bottom square commutes
trivially. Hence (in . out) is a map from (T,in) to (T,in); but id
is also such a map, so by uniqueness we have id = in . out.
Now from the upper square we have
out . in = F in . F out = F (in . out) = F id = id,
completing the proof. It is precisely because "in" is an isomorphism
that we are justified in calling T a fixpoint of F, since we have
F T ~ T; it is because T is initial that we are justified in calling
it a least fixpoint.
The argument in the preceding two paragraphs works in any category. In our
given category, polymorphic lambda calculus, with the given definitions
of "fold" and "in", we can go further. Take (2) as the left-hand
square of (3); then the right-hand square becomes,
fold T in
T -----------> T
| |
| |
id | | fold X k
| |
| |
T -----------> X ,
fold X k
or, in symbols,
fold X k (fold T in t) = fold X k t.
Reducing both applications of (fold X k) yields
(fold T in t) X k = t X k
(this is where we use the definition of "fold"). By the eta rules,
it follows that
fold T in t = t
(this is where we use properties of polymorphic lambda calculus).
Thus, for the given definitions, (3) implies (4), and hence (T,in) is
an initial F-algebra exactly when (3) holds.
Law (3) does not follow from the reduction laws of polymorphic lambda
calculus, and indeed there are models that do not satisfy it. But this
law is satisfied in many models, including all those having the
property of PARAMETRICITY (see [Reynolds 83], [Freyd 89], [Wadler 89],
[Freyd 90], [AMSW 90]). In particular, in the jargon of [Wadler 89],
the "Theorem for Free" derived from the type of "fold" is just this
law.
(A technical point: The "Theorems for Free" result really deals with
relations, not functions. In the diagram (3) each arrow denotes a relation
rather than a function, namely, the relation induced by the function.
Except the arrow labeled "id" actually corresponds to the relation
id' defined by the relation
id' = (All r. (F r -> r) -> r).
Here r->s relates f:X->Y to f':X'->Y' if whenever r relates x to x'
then s relates (f x) to (f' x'). And if G is an operation on
relations and types such that if r:X<->X' then (G r):(G X)<->(G X'),
then (All r. G r) relates g : (All X. G X) to g' : (All X'. G X')
(these are both the same type), if whenever r is a relation from X to
X' then (g X) (G r) (g' X). It is not the case in all models that id'
is the identity relation! This is the purpose of the identity lemma
in [Reynolds 83] -- in any model satisfying this lemma, id' will be
the identity. A parametric model is just one that satisfies the
identity lemma. Hence, although the "Theorems for Free" result
applies in any model, it is only in a parametric model that (3)
and (4) must hold.)
ITERATORS AND RECURSORS
The "fold" operation is what has sometimes been called an iterator.
The value of (fold X k x) is computed by applying (fold X k)
recursively to each substructure of x and applying k to the result.
More formally, we can refer to the substructure of x by taking
x = in y, for some y : F T, and we can apply (fold X k) to each
substructure of x by taking (F (fold X k) y). Then the sentence above
can be expressed in symbols:
fold X k (in y) = k (F (fold X k)).
This fundamental property is just equivalent to saying that
"fold" is a map of F-algebras.
The recursor is a slight variant of the iterator. It is defined by
rec : All X. (F (X*T) -> X) -> T -> (X*T)
rec = \X. \g: F (X*T) -> X. fold (X*T)