Category Theory



The yearly Advent of Code is always a source of interesting coding challenges. You can often solve them the easy way, or spend days trying to solve them “the right way.” I personally prefer the latter. This year I decided to do some yak shaving with a puzzle that involved looking for patterns in a grid. The pattern was the string XMAS, and it could start at any location and go in any direction whatsoever.

My immediate impulse was to elevate the grid to a comonad. The idea is that a comonad describes a data structure in which every location is a center of some neighborhood, and it lets you apply an algorithm to all neighborhoods in one fell swoop. Common examples of comonads are infinite streams and infinite grids.

Why would anyone use an infinite grid to solve a problem on a finite grid? Imagine you’re walking through a neighborhood. At every step you may hit the boundary of a grid. So a function that retrieves the current state is allowed to fail. You may implement it as returning a Maybe value. So why not pre-fill the infinite grid with Maybe values, padding it with Nothing outside of bounds. This might sound crazy, but in a lazy language it makes perfect sense to trade code for data.

I won’t bore you with the details, they are available at my GitHub repository. Instead, I will discuss a similar program, one that I worked out some time ago, but wasn’t satisfied with the solution: the famous Conway’s Game of Life. This one actually uses an infinite grid, and I did implement it previously using a comonad. But this time I was more ambitious: I wanted to generate this two-dimensional comonad by composing a pair of one-dimensional ones.

The idea is simple. Each row of the grid is an infinite bidirectional stream. Since it has a specific “current position,” we’ll call it a cursor. Such a cursor can be easily made into a comonad. You can extract the current value; and you can duplicate a cursor by creating a cursor of cursors, each shifted by the appropriate offset (increasing in one direction, decreasing in the other).

A two-dimensional grid can then be implemented as a cursor of cursors–the inner one extending horizontally, and the outer one vertically.

It should be a piece of cake to define a comonad instance for it: extract should be a composition of (extract . extract) and duplicate a composition of (duplicate . fmap duplicate), right? It typechecks, so it must be right. But, just in case, like every good Haskell programmer, I decided to check the comonad laws. There are three of them:

extract . duplicate = id
fmap extract . duplicate = id
duplicate . duplicate = fmap duplicate . duplicate

And they failed! I must have done something illegal, but what?

In cases like this, it’s best to turn to basics–which means category theory. Compared to Haskell, category theory is much less verbose. A comonad is a functor W equipped with two natural transformations:

\varepsilon \colon W \to \text{Id}

\delta \colon W \to W \circ W

In Haskell, we write the components of these transformations as:

extract :: w a -> a
duplicate :: w a -> w (w a)

The comonad laws are illustrated by the following commuting diagrams. Here are the two counit laws:

and one associativity law:

These are the same laws we’ve seen above, but the categorical notation makes them look more symmetric.

So the problem is: Given a comonad W, is the composition W \circ W also a comonad? Can we implement the two natural transformations for it?

\varepsilon_c \colon W \circ W \to \text{Id}

\delta_c \colon W \circ W \to W \circ W \circ W \circ W

The straightforward implementation would be:

W \circ W \xrightarrow{\varepsilon \circ W} W \xrightarrow{\varepsilon} \text{Id}

corresponding to (extract . extract), and:

W \circ W \xrightarrow{W \circ \delta} W \circ W \circ W \xrightarrow{\delta \circ W \circ W} W \circ W \circ W \circ W

corresponding to (duplicate . fmap duplicate).

To see why this doesn’t work, let’s ask a more general question: When is a composition of two comonads, say W_2 \circ W_1, again a comonad? We can easily define a counit:

W_2 \circ W_1 \xrightarrow{\varepsilon_2 \circ W_1} W \xrightarrow{\varepsilon_1} \text{Id}

The comultiplication, though, is tricky:

W_2 \circ W_1 \xrightarrow{W_2 \circ \delta_1} W_2 \circ W_1 \circ W_1 \xrightarrow{\delta_2 \circ W} W_2 \circ W_2 \circ W_1 \circ W_1

Do you see the problem? The result is W_2^2 \circ W_1^2 but it should be (W_2 \circ W_1)^2. To make it a comonad, we have to be able to push W_2 through W_1 in the middle. We need W_2 to distribute over W_1 through a natural transformation:

\lambda \colon W_2 \circ W_1 \to W_1 \circ W_2

But isn’t that only relevant when we compose two different comonads–surely any functor distributes over itself! And there’s the rub: Not every comonad distributes over itself. Because a distributive comonad must preserve the comonad laws. In particular, to restore the the counit law we need this diagram to commute:

and for the comultiplication law, we require:

Even if the two comonad are the same, the counit condition is still non-trivial:

The two whiskerings of \varepsilon are in general not equal. All we can get from the original comonad laws is that they are only equal when applied to the result of  comultiplication:

(\varepsilon \circ W) \cdot \delta = (W \circ \varepsilon) \cdot \delta.

Equipped with the distributive mapping \lambda we can complete our definition of comultiplication for a composition of two comonads:

W_2 \circ W_1 \xrightarrow{W_2 \circ \delta_1} W_2 \circ W_1^2 \xrightarrow{\delta_2 \circ W} W_2^2 \circ W_1^2 \xrightarrow{W_2 \circ \lambda \circ W_1} (W_2 \circ W_1)^2

Going back to our Haskell code, we need to impose the distributivity condition on our comonad. There is a type class for it defined in Data.Distributive:

class Functor w => Distributive w where
  distribute :: Functor f => f (w a) -> w (f a)

Thus the general formula for composing two comonads is:

instance (Comonad w2, Comonad w1, Distributive w1) => 
Comonad (Compose w2 w1) where extract = extract . extract . getCompose duplicate = fmap Compose . Compose . fmap distribute . duplicate . fmap duplicate . getCompose

In particular, it works for composing a comonad with itself, as long as the comonad distributes over itself.

Equipped with these new tools, let’s go back to implementing a two-dimensional infinite grid. We start with an infinite stream:

data Stream a = (:>) { headS :: a
                     , tailS :: Stream a}
  deriving Functor

infixr 5 :>

What does it mean for a stream to be distributive? It means that we can transpose a “matrix” whose rows are streams. The functor f is used to organize these rows. It could, for instance, be a list functor, in which case you’d have a list of (infinite) streams.

  [   1 :>   2 :>   3 .. 
  ,  10 :>  20 :>  30 ..
  , 100 :> 200 :> 300 .. 
  ]

Transposing a list of streams means creating a stream of lists. The first row is a list of heads of all the streams, the second row is a list of second elements of all the streams, and so on.

  [1, 10, 100] :>
  [2, 20, 200] :>
  [3, 30, 300] :>
  ..

Because streams are infinite, we end up with an infinite stream of lists. For a general functor, we use a recursive formula:

instance Distributive Stream where
    distribute :: Functor f => f (Stream a) -> Stream (f a)
    distribute stms = (headS  stms) :> distribute (tailS  stms)

(Notice that, if we wanted to transpose a list of lists, this procedure would fail. Interestingly, the list monad is not distributive. We really need either fixed size or infinity in the picture.)

We can build a cursor from two streams, one going backward to infinity, and one going forward to infinity. The head of the forward stream will serve as our “current position.”

data Cursor a = Cur { bwStm :: Stream a
                    , fwStm :: Stream a }
  deriving Functor

Because streams are distributive, so are cursors. We just flip them about the diagonal:

instance Distributive Cursor where
    distribute :: Functor f => f (Cursor a) -> Cursor (f a)
    distribute fCur = Cur (distribute (bwStm  fCur)) 
                          (distribute (fwStm  fCur))

A cursor is also a comonad:

instance Comonad Cursor where
  extract (Cur _ (a :> _)) = a
  duplicate bi = Cur (iterateS moveBwd (moveBwd bi)) 
                     (iterateS moveFwd bi)

duplicate creates a cursor of cursors that are progressively shifted backward and forward. The forward shift is implemented as:

moveFwd :: Cursor a -> Cursor a
moveFwd (Cur bw (a :> as)) = Cur (a :> bw) as

and similarly for the backward shift.

Finally, the grid is defined as a cursor of cursors:

type Grid a = Compose Cursor Cursor a

And because Cursor is a distributive comonad, Grid is automatically a lawful comonad. We can now use the comonadic extend to advance the state of the whole grid:

generations :: Grid Cell -> [Grid Cell]
generations = iterate $ extend nextGen

using a local function:

nextGen :: Grid Cell -> Cell
nextGen grid
  | cnt == 3 = Full
  | cnt == 2 = extract grid
  | otherwise = Empty
  where
      cnt = countNeighbors grid

You can find the full implementation of the Game of Life and the solution of the Advent of Code puzzle, both using comonad composition, on my GitHub.


Previously: Covering Sieves.

We’ve seen an intuitive description of presheaves as virtual objects. We can use the same trick to visualize natural transformations.

A natural transformation can be drawn as a virtual arrow \alpha between two virtual objects corresponding to two presheaves S and P. Indeed, for every s_a \in S a, seen as an arrow a \to S, we get an arrow a \to P simply by composition \alpha \circ s_a. Notice that we are thus defining the composition with \alpha, because we are outside of the original category. A component \alpha_a of a natural transformation is a mapping between two arrows.

Untitled Artwork

This composition must be associative and, indeed, associativity is guaranteed by the naturality condition. For any arrow f \colon a \to b, consider a zigzag path from a to P given by \alpha \circ s_b \circ f. The two ways of associating this composition give us \alpha_a \circ S f = P f \circ \alpha_b.

Untitled Artwork

Let’s now recap our previous definitions: A cover of u is a bunch of arrows converging on u satisfying certain conditions. These conditions are defined in terms of a coverage. For every object u we define a whole family of covers, and then combine them into one big collection that we call the coverage.

A sheaf is a presheaf that is compatible with a coverage. It means that for every cover \{u_i\} , if we pick a compatible family of x_i \in P u_i that agrees on all overlaps, then this uniquely determines the element (virtual arrow) x \in P u.

Untitled Artwork

A covering sieve of u is a presheaf that extends a cover \{u_i\} . It assigns a singleton set to each u_i and all its open subsets (that is objects that have arrows pointing to u_i); and an empty set otherwise. In particular, the sieve includes all the overlaps, like u_i \cap u_j, even if they are not present in the original cover.

The key observation here is that a sieve can serve as a blueprint for, or a skeleton of, a compatible family \{ x_i \}. Indeed, S_u maps all objects either to singletons or to empty sets. In terms of virtual arrows, there is at most one arrow going to S_u from any object. This is why a natural transformation from S_u to any presheaf P produces a family of arrows x_i \in P u_i. It picks a single arrow from each of the hom-sets u_i \to P.

Untitled Artwork

The sieve includes all intersections, and all diagrams involving those intersections necessarily commute. They commute because the category we’re working with is thin, and so is the category extended by adding the virtual object S_u. Thus a family generated by a natural transformation \alpha \in Nat (S_u, P) is automatically a compatible family. Therefore, if P is a sheaf, it determines a unique element x \in P u.

This lets us define a sheaf in terms of sieves, rather than coverages.

A presheaf P is a sheaf if and only if, for every covering sieve S_u of every u, there is a one-to-one correspondence between the set of natural transformations Nat (S_u, P) and the set P u.

In terms of virtual arrows, this means that there is a one-to-one correspondence between arrows \alpha \colon S_u \to P and x \colon u \to P.

Untitled Artwork


Previously: Sheaves as Virtual Objects.

In order to define a sheaf, we have to start with coverage. A coverage defines, for every object u, a family of covers that satisfy the sub-coverage conditions. Granted, we can express coverage using objects and arrows, but it would be much nicer if we could use the language of functors and natural transformations.

Let’s start with the idea that, categorically, a cover of u is a bunch of arrows converging on u. Each arrow p_i \colon u_i \to u is a member of the hom-set \mathcal C (u_i, u). Now consider the fact that \mathcal C (-, u) is a presheaf, \mathcal C^{op} \to \mathbf{Set}, and ask the question: Is a cover a “subfunctor” of \mathcal C (-, u)?

A subfunctor of a presheaf P is defined as a functor S such that, for each object v, S v is a subset of P vand, for each arrow f \colon v \to w, the function S f \colon S w \to S v is a restriction of P f.

Untitled Artwork

In general, a cover does not correspond to a subfunctor of the hom-functor. Let’s see why, and how we can fix it.

Let’s try to define S, such that S u_i is non-empty for any object u_i that’s in the cover of u, and empty otherwise. As a presheaf, we could represent it as a virtual object with arrows coming from all \{ u_i \}‘s.

Untitled Artwork

Now consider an object v that is not in the cover, but it has an arrow f \colon v \to u_k connecting it to some element u_k of the cover. Functoriality requires the (virtual) composition s_k \circ f to exist.Untitled Artwork

Thus v must be included in the cover–if we want S to be a functor.

In particular, if we are looking at a category of open sets with inclusions, this condition means that all (open) sub-sets of the covering sets must also be included in the cover. Such a “downward closed” family of sets is called a sieve.

Imagine sets in the cover as holes in a sieve. Smaller sets that can “pass through” these holes must also be parts of the sieve.

If you start with a cover, you can always extend it to a covering sieve by adding more arrows. It’s as if you started with a few black holes, and everything that could fall into them, would fall.

We have previously defined sheaves in terms of coverings. In the next installment we’ll see that they can equally well be defined using covering sieves.

Next Sieves and Sheaves.


Previously: Coverages and Sites

The definition of a sheaf is rather complex and involves several layers of abstraction. To help us navigate this maze we can use some useful intuitions. One such intuition is to view objects in our category as some kind of sets (in particular, open sets, when we talk about topology), and arrows as set inclusions. An arrow from v to u means that v is a subset of u.

A cover of u is a family of arrows \{ p_i \colon u_i \to u \}. A coverage assigns a collection of covers to every object, satisfying the sub-coverage conditions described in the previous post. A category with coverage is called a site.

The next layer of abstraction deals with presheaves, which are set-valued contravariant functors. Interestingly, there is a way to interpret a presheaf as an extension of the original category. I learned this trick from Paolo Perrone.

We may represent a presheaf P using virtual hom-sets. First we add one virtual object, let’s call it \bullet , to our category. The set P u is then interpreted as the set of arrows from u to \bullet.

Untitled Artwork

Moreover, we can represent the action of P on arrows as simple composition. Take an arrow f \colon v \to u. The presheaf lifts it to a function between sets: P f \colon P u \to P v (contravariance means that the arrow is reversed). For any h \in P u we can define the composition h \circ f to be (P f) h.

Untitled Artwork

Incidentally, if the functor P is representable, it means that we can replace the virtual object \bullet with an actual object in our category.

Notice that, even though the category of open sets with inclusions is a poset (hom-sets are either singletons or empty, and all diagrams automatically commute), the added virtual hom-sets usually contain lots of arrows. In topology these hom-sets are supposed to represent sets of continuous functions over open sets.

We can interpret the virtual object \bullet as representing an imaginary open set that “includes” all the objects u for which P u is non-empty, but we have to imagine that it’s possible to include an object in more than one way, to account for multiple arrows. In fact, in what follows we won’t be assuming that the underlying category is a poset, so virtual hom-sets are nothing special.

To express the idea of intersections of open sets, we use commuting diagrams. For every pair of objects u_i and u_j that are in the cover of u,  an object v is in their intersection if  the following diagram commutes:

Untitled Artwork

Note that in a poset all diagrams commute, but here we’re generalizing this condition to an arbitrary category. We could say that v is in the intersection of u_i and u_j seen as covers of u.

Equipped with this new insight, we can now express the sheaf condition. We assume that there is a coverage defined in our category. We are adding one more virtual object \bullet for the presheaf P, with bunches of virtual arrows pointing to it.

For every cover \{ p_i \colon u_i \to u \} we try to select a family of virtual arrows, s_i \colon u_i \to \bullet. It’s as if the objects u_i, besides covering u, also covered the virtual object \bullet.

We call the family \{s_i \} a matching family, if this new covering respects the existing intersections. If v is in the intersection of u_i and u_j (as covers of u, see the previous diagram), then we want the following diagram to also commute:
Untitled Artwork
In other words, the \{u_i\}‘s intersect as covers of \bullet.

A presheaf P is a sheaf if, for every covering family p_i and every matching family s_i there exists a unique s \colon u \to \bullet that factorizes those s_i‘s:
Untitled Artwork
Translating it back to the language of topology: There is a unique global function s defined over u whose restrictions are s_i‘s.

The advantage of this approach is that it’s easy to imagine the sheafification of an arbitrary presheaf by freely adding virtual arrows (the s‘s and their compositions with p_i‘s in the above diagram) to all intersection diagrams.

Next: Covering Sieves


Previously: Presheaves and Topology.

In all branches of science we sooner or later encounter the global vs. local duality. Topology is no different.

In topology we have the global definition of continuity: counter-images of all open sets are open. But we perceive a discontinuity as a local jump. How are the two pictures related, and can we express this topologically, that is without talking about sizes and distances?

All we have at our disposal are open sets, so exactly what properties of open sets are the most relevant? They do form a (thin) category with inclusions as arrows, but so does any set of subsets. As it turns out open sets can be stitched together to create coverings. Such coverings let us zoom in on finer and finer details, thus creating the bridge between the global and the local picture.

Open sets are plump–they can easily fill the bulk of space. They are also skinless, so they can’t touch each other without some overlap. That makes them perfect for constructing covers.

Covering, unlike tiling, requires overlapping. To create a leak-free roof, you need your tiles to overlap. The idea is that, if we were defining functions over a tiling, it would be possible for them to make sudden jumps at tile boundaries. Open coverings overlap, so such functions have to flow continuously.

IMG_0427

An open cover of a set u is a family of open sets \{u_i\} such that u is their union:

u = \bigcup_{i \in I} u_i

Here I is a set used for indexing the family.

If we have a continuous function f defined over u, then all its restrictions f|_{u_i} are also continuous (this follows from the condition that an intersection of open sets is open). Thus going from global to local is easy.

The converse is more interesting. Suppose that we have a family of functions f_i, one per each open set u_i, and we want to reconstruct the global function f defined over the set u covered by u_i‘s. This is only possible if the individual functions agree on overlaps.

Take two functions: f_i defined over u_i, and f_j defined over u_j. If the two sets overlap, each of the functions can be restricted to the overlap u_i \cap u_j. We want these restrictions to be equal:

f_i|_{u_i \cap u_j} = f_j|_{u_i \cap u_j}

IMG_0427

If all individual continuous functions agree on the overlaps then they uniquely determine a global continuous function f defined over the whole set u. You can stitch or collate functions that are defined locally.

In the language of category theory we talk about functions in bulk. We define a functor–a presheaf P–that maps all open sets to sets of continuous functions. In this language, to an open cover \{u_i\} corresponds a family of functions \{f_i\} that are members of the individual sets P u_i. Every such selection forms a giant I-indexed tuple, that is an element of the cartesian product:

\{f_i | i \in I\} \in \prod_{i} P u_i

Similarly, we can gather functions that are defined over the intersections of sets into a product:

\prod_{i j} P (u_i \cap u_j)

(Notice that every empty intersection corresponds to a single trivial function that we call absurd in Haskell.)

Set inclusions generate function restrictions. In particular, for every intersection u_i \cap u_j we have a pair of restrictions:
IMG_0427

f_i \mapsto f_i|_{u_i \cap u_j}

f_j \mapsto f_j|_{u_i \cap u_j}

These restrictions can be seen as functions between sets:

P u_i \to P (u_i \cap u_j)

P u_j \to P (u_i \cap u_j)

If all such restrictions are pairwise equal, we call \{f_i\} a matching family, and for every such matching family there is a unique element f \in P u such that f_i = f|_{u_i}, for all i.

These pairs of restrictions define two mappings between our big products:

p, q : \prod_i P u_i \rightrightarrows \prod_{i j} P (u_i \cap u_j)

Think of each function as acting on a tuple \{f_k\} and producing a matrix indexed by elements of I:

(p\; \{f_k\})_{i j} = f_i|_{u_i \cap u_j}

(q\; \{f_k\})_{i j} = f_j|_{u_i \cap u_j}

Our matching condition can be expressed in the language of category theory by saying that the following diagram is an equalizer of p and q (the two parallel arrows):

P u \xrightarrow{e} \prod_i P u_i \rightrightarrows \prod_{i j} P (u_i \cap u_j)

Here e is defined as mapping a function f \in P u to a tuple of its restrictions \{ f|{u_i}\}. These restrictions are then required to match when further restricted by p and q to all possible intersections.

A presheaf P is called a sheaf if, for every open covering \{u_i\}, a matching family \{f_i\} uniquely determines the element of P u of the equalizer above. This element corresponds to the function f that is the result of stitching of individual functions.

Notice that, even though we tried to use the categorical language as much as possible, we still had to rely on the language of sets to define coverings. To abstract away from set theory and traditional topology, we need to talk about sites.

Next: Coverages and Sites .


Previously: Topology as a Dietary Choice.

Category theory lets us change the focus from individual objects to relationships between them. Since topology is defined using open sets, we’d start by concentrating on relations between sets.

One such obvious relation is inclusion. It imposes a categorical structure on the subsets of a given set X. We draw arrows between two sets whenever one is a subset of the other. These arrows satisfy the axioms of a category: there is an identity arrow for every object (every set is its own subset) and arrows compose (inclusion is transitive). Not every pair of objects is connected by an arrow–some sets are disjoint, others overlap only partially. We may include the whole space as the terminal object (with arrows coming from every subset) and the empty set \emptyset as the initial object (with arrows going to every set). As categories go, this is a thin category, because there is at most one arrow between any two objects.

IMG_0415

Every topological space gives thus rise to a thin category that abstracts the structure of its open sets. But the real reason for defining a topology is to be able to talk about continuous functions. These are functions between topological spaces such that the inverse image of every open set is open. Here, again, category theory tells us not to think about the details of how these functions are defined, but rather what we can do with them. And not just one function at a time, but the whole bunch at once.

So let’s talk about sets of functions. We have our topological space X, and to each open subset u we will assign a set of continuous function on it. These could be functions to real or complex numbers, or whatever–we don’t care. All we care about is that they form a set.

Since open sets in X form a (thin) category, we are talking about assigning to each object (open set) u its own set (of continuous functions) P u. Notice however that these sets of functions are not independent of each other. If one open set is a subset of another, it inherits all the functions defined over the larger set. These are the same functions, the only difference being that their arguments are restricted to a smaller subset. For instance, given two sets v \subseteq u and a function f \colon u \to \mathbb R, there is a function f|_{v} \colon v \to \mathbb R such that f|_{v} = f on v.

IMG_0415

Let’s restate these statements in the categorical language. We already have a category X of open sets with inclusion. The sets of functions on these open sets are objects in the category \mathbf{Set}. We have defined a mapping P between these two categories that assigns sets of functions to open sets.

Notice that we are dealing with two different categories whose objects are sets. One has inclusions as arrows, the other has functions as arrows. (To confuse matters even more, the objects in the second category represent sets of functions.)

To define a functor between categories, we also need a mapping of arrows to accompany the mapping of objects. An arrow v \to u means that v \subseteq u. Corresponding to it, we have a function P u \to P v that assigns to each f \in P u its restriction f|_{v} \in P v.

IMG_0415

Together, these mappings define a functor P \colon X^{op} \to \mathbf{Set}. The “op” notation means that the directions of arrows are reversed: the functor is “contravariant.”

A functor must preserve the structure of a category, that is identity and composition. In our case this follows from the fact that an identity u \subseteq u maps to a trivial do-nothing restriction, and that restrictions compose: (f|_v)|_w = f|_w for w \subseteq v \subseteq u.

There is a special name for contravariant functors from any category \mathcal C to \mathbf{Set}. They are called presheaves, exactly because they were first introduced in the context of topology as precursors of “shaves.” Consequently, the simpler functors \mathcal C \to \mathbf{Set} had to be confusingly called co-presheaves.

Presheaves on \mathcal C form their own category, often denoted by \hat{\mathcal C}, with natural transformations as arrows.

Next: Sheaves and Topology.


I will now provide the categorical foundation of the Haskell implementation from the previous post. A PDF version that contains both parts is also available.

The Para Construction

There’s been a lot of interest in categorical foundations of deep learning. The basic idea is that of a parametric category, in which morphisms are parameterized by objects from a monoidal category \mathcal P:

Screenshot 2024-03-24 at 15.00.20
Here, p is an object of \mathcal P.

When two such morphisms are composed, the result is parameterized by the tensor product of the parameters.

Screenshot 2024-03-24 at 15.00.34

An identity morphism is parameterized by the monoidal unit I.

If the monoidal category \mathcal P is not strict, the parametric composition and identity laws are not strict either. They are satisfied up to associators and unitors of \mathcal P. A category with lax composition and identity laws is called a bicategory. The 2-cells in a parametric bicategory are called reparameterizations.

Of particular interest are parameterized bicategories that are built on top of actegories. An actegory \mathcal C is a category in which we define an action of a monoidal category \mathcal P:

\bullet \colon \mathcal P \times \mathcal C \to C

satisfying some obvious coherency conditions (unit and composition):

I \bullet c \cong c

p \bullet (q \bullet c) \cong (p \otimes q) \bullet c

There are two basic constructions of a parametric category on top of an actegory called \mathbf{Para} and \mathbf{coPara}. The first constructs parametric morphisms from a to b as f_p = p \bullet a \to b, and the second as g_p = a \to p \bullet b.

Parametric Optics

The \mathbf{Para} construction can be extended to optics, where we’re dealing with pairs of objects from the underlying category (or categories, in the case of mixed optics). The parameterized optic is defined as the following coend:

O \langle a, da \rangle \langle p, dp \rangle \langle s, ds \rangle = \int^{m} \mathcal C (p \bullet s, m \bullet a) \times \mathcal C (m \bullet da, dp \bullet ds)

where the residues m are objects of some monoidal category \mathcal M, and the parameters \langle p, dp \rangle come from another monoidal category \mathcal P.

In Haskell, this is exactly the existential lens:

data ExLens a da p dp s ds = 
  forall m . ExLens ((p, s)  -> (m, a))  
                    ((m, da) -> (dp, ds))

There is, however, a more general bicategory of pre-optics, which underlies existential optics. In it, both the parameters and the residues are treated symmetrically.

The PreLens Bicategory

Pre-optics break the feedback loop in which the residues from the forward pass are fed back to the backward pass. We get the following formula:

\begin{aligned}O & \langle a, da \rangle \langle m, dm \rangle \langle p, dp \rangle \langle s, ds \rangle = \\  &\mathcal C (p \bullet s, m \bullet a) \times \mathcal C (dm \bullet da, dp \bullet ds)  \end{aligned}

We interpret this as a hom-set from a pair of objects \langle s, ds \rangle in \mathcal C^{op} \times C to the pair of objects \langle a, da \rangle also in \mathcal C^{op} \times C, parameterized by a pair \langle m, dm \rangle in \mathcal M \times \mathcal M^{op} and a pair \langle p, dp \rangle from \mathcal P^{op} \times \mathcal P.

To simplify notation, I’ll use the bold \mathbf C for the category \mathcal C^{op} \times \mathcal C , and bold letters for pairs of objects and (twisted) pairs of morphisms. For instance, \bold f \colon \bold a \to \bold b is a member of the hom-set \mathbf C (\bold a, \bold b) represented by a pair \langle f \colon a' \to a, g \colon b \to b' \rangle.

Similarly, I’ll use the notation \bold m \bullet \bold a to denote the monoidal action of \mathcal M^{op} \times \mathcal M on \mathcal C^{op} \times \mathcal C:

\langle m, dm \rangle \bullet \langle a, da \rangle = \langle m \bullet a, dm \bullet da \rangle

and the analogous action of \mathcal P^{op} \times \mathcal P.

In this notation, the pre-optic can be simply written as:

O\; \bold a\, \bold m\, \bold p\, \bold s = \bold C (\bold m \bullet \bold a, \bold p \bullet \bold b)

and an individual morphism as a triple:

(\bold m, \bold p, \bold f \colon \bold m \bullet \bold a \to \bold p \bullet \bold b)

Pre-optics form hom-sets in the \mathbf{PreLens} bicategory. The composition is a mapping:

\mathbf C (\bold m \bullet \bold b, \bold p \bullet \bold c) \times \mathbf C (\bold n \bullet \bold a, \bold q \bullet \bold b) \to \mathbf C (\bold (\bold m \otimes \bold n) \bullet \bold a, (\bold q \otimes \bold p) \bullet \bold c)

Indeed, since both monoidal actions are functorial, we can lift the first morphism by (\bold q \bullet -) and the second by (\bold m \bullet -):

\mathbf C (\bold m \bullet \bold b, \bold p \bullet \bold c) \times \mathbf C (\bold n \bullet \bold a, \bold q \bullet \bold b) \xrightarrow{(\bold q \bullet) \times (\bold m \bullet)}

\mathbf C (\bold q \bullet \bold m \bullet \bold b, \bold q \bullet \bold p \bullet \bold c) \times \mathbf C (\bold m \bullet \bold n \bullet \bold a,\bold m \bullet \bold q \bullet \bold b)

We can compose these hom-sets in \mathbf C, as long as the two monoidal actions commute, that is, if we have:

\bold q \bullet \bold m \bullet \bold b \to \bold m \bullet \bold q \bullet \bold b

for all \bold q, \bold m, and \bold b.
The identity morphism is a triple:

(\bold 1, \bold 1, \bold{id} )

parameterized by the unit objects in the monoidal categories \mathbf M and \mathbf P. Associativity and identity laws are satisfied modulo the associators and the unitors.

If the underlying category \mathcal C is monoidal, the \mathbf{PreOp} bicategory is also monoidal, with the obvious point-wise parallel composition of pre-optics.

Triple Tambara Modules

A triple Tambara module is a functor:

T \colon \mathbf M^{op} \times \mathbf P \times \mathbf C \to \mathbf{Set}

equipped with two families of natural transformations:

\alpha \colon T \, \bold m \, \bold p \, \bold a \to T \, (\bold n \otimes \bold m) \, \bold p \, (\bold n \bullet a)

\beta \colon T \, \bold m \, \bold p \, (\bold r \bullet \bold a) \to T \, \bold m \, (\bold p \otimes \bold r) \, \bold a

and some coherence conditions. For instance, the two paths from T \, \bold m \, \bold p\, (\bold r \bullet \bold a) to T \, (\bold n \otimes \bold m)\, (\bold p \otimes \bold r) \, (\bold n \bullet \bold a) must give the same result.

One can also define natural transformations between such functors that preserve the two structures, and define a bicategory of triple Tambara modules \mathbf{TriTamb}.

As a special case, if we chose the category \mathcal P to be the trivial one-object monoidal category, we get a version of (double-) Tambara modules. If we then take the coend, P \langle a, b \rangle = \int^m T \langle m, m\rangle \langle a, b \rangle, we get regular Tambara modules.

Pre-optics themselves are an example of a triple Tambara representation. Indeed, for any fixed \bold a, we can define a mapping \alpha from the triple:

(\bold m, \bold p, \bold f \colon \bold m \bullet \bold a \to \bold p \bullet \bold b)

to the triple:

(\bold n \otimes \bold m, \bold p, \bold f' \colon (\bold n \otimes \bold m) \bullet \bold a \to \bold p \bullet (\bold n \bullet \bold b))

by lifting of \bold f by (\bold n \bullet -) and rearranging the actions using their commutativity.
Similarly for \beta, we map:

(\bold m, \bold p, \bold f \colon \bold m \bullet \bold a \to \bold p \bullet (\bold r \bullet \bold b))

to:

(\bold m , (\bold p \otimes \bold r), \bold f' \colon \bold m \bullet \bold a \to (\bold p \otimes \bold r) \bullet \bold b)

Tambara Representation

The main result is that morphisms in \mathbf {PreOp} can be expressed using triple Tambara modules. An optic:

(\bold m, \bold p, \bold f \colon \bold m \bullet \bold a \to \bold p \bullet \bold b)

is equivalent to a triple end:

\int_{\bold r \colon \mathbf P} \int_{\bold n \colon \mathbf M} \int_{T \colon \mathbf{TriTamb}} \mathbf{Set} \big(T \, \bold n \, \bold r \, \bold a, T \, (\bold m \otimes \bold n) \, (\bold r \otimes \bold p) \, \bold b \big)

Indeed, since pre-optics are themselves triple Tambara modules, we can apply the polymorphic mapping of Tambara modules to the identity optic (\bold 1, \bold 1, \bold{id} ) and get an arbitrary pre-optic.

Conversely, given an optic:

(\bold m, \bold p, \bold f \colon \bold m \bullet \bold a \to \bold p \bullet \bold b)

we can construct the polymorphic mapping of triple Tambara modules:

\begin{aligned} & T \, \bold n \, \bold r \, \bold a \xrightarrow{\alpha} T \, (\bold m \otimes \bold n) \, \bold r \, (\bold m \bullet \bold a) \xrightarrow{T \, \bold f} T \, (\bold m \otimes \bold n) \, \bold r \, (\bold p \bullet \bold b) \xrightarrow{\beta} \\ & T \, (\bold m \otimes \bold n) \, (\bold r \otimes \bold p) \, \bold b  \end{aligned}

Bibliography

  1. Brendan Fong, Michael Johnson, Lenses and Learners,
  2. Brendan Fong, David Spivak, Rémy Tuyéras, Backprop as Functor: A compositional perspective on supervised learning, 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS) 2019, pp. 1-13, 2019.
  3. G.S.H. Cruttwell, Bruno Gavranović, Neil Ghani, Paul Wilson, Fabio Zanasi, Categorical Foundations of Gradient-Based Learning
  4. Bruno Gavranović, Compositional Deep Learning
  5. Bruno Gavranović, Fundamental Components of Deep Learning, PhD Thesis. 2024

There is an exercise in Saunders Mac Lane’s “Categories for the Working Mathematician” that was a lesson in humility for me. Despite several hints provided by Mac Lane, all my attempts to solve it failed. Finally my Internet search led me to a diagram that looked promissing and it allowed me to crack the problem.

Why do I think this is interesting? Because it shows the kind of pattern matching and shape shifting that is characteristic of categorical proofs. The key is the the use of visual representations and the ability to progressively hide the details under the terseness of notation until the big picture emerges.

The Exercise

This is, slightly parapharased, exercise 1.1 in chapter VII, Monoids:

Prove that the pentagon identity and the triangle identity imply:
Problem

First, let me explain what it all means. We are working in a monoidal category, that is a category with a tensor product. Given two objects a and b, we can construct their product a \otimes b. Similarly, given two arrows f \colon a \to b and g \colon a' \to b', we can construct an arrow

f \otimes g \colon a \otimes b \to a' \otimes b'

In other words, \otimes is a (bi-)functor.

There is a special object 1 that serves as a unit with respect to the tensor product. But, since in category theory we shy away from equalities on objects, the unit laws are not equalities, but rather natural isomorphisms, whose components are:

\lambda_a \colon 1 \otimes a \to a

\rho_a \colon a \otimes 1 \to a

These transformations are called, respecively, the left and right unitors. We’ll come back to naturality later, when we have to use it in anger.

We want the tensor product to be associative, again using a natural isomorphism called the associator:

\alpha_{a b c} \colon a \otimes (b \otimes c) \to (a \otimes b) \otimes c

The components of natural transformations are just regular arrows, so we can tensor them. In particular, we can tensor a left unitor \lambda_a with an identity natural transformation \text{id} to get:

\lambda_a \otimes \text{id}_b \colon (1 \otimes a) \otimes b \to a \otimes b

Since tensoring with identity is a common operation, it has a name “whiskering,” and is abbreviated to \lambda_a \otimes b. Any time you see a natural transformation tensored with an object, it’s a shorthand for whiskering.

You are now well equipped to understand the diagram from the exercise.
Problem
The goal is to prove that it commutes, that is:

\lambda_{a \otimes b} = (\lambda_a \otimes b) \circ \alpha_{1 a b}

From now on, in the spirit of terseness, I will be mostly omitting the tensor sign, so the above will be written as:

\lambda_{a b} = \lambda_a b \circ \alpha_{1 a b}

Since most of the proof revolves around different ways of parenthesising multiple tensor products, I will use a simple, self explanatory graphical language, where parenthesized products are represented by binary trees. Using trees will help us better recognize shapes and patterns.
Associativity and Unit Laws
The associator flips a switch, and the unitors absorb the unit, which is represented by a green blob.

In tree notation, our goal is to show that the following diagram commutes:
The Problem

We also assume that the associator and the unitors satisfy some laws: the pentagon identity and the triangle identity. I will introduce them as needed.

The Pentagon

The first hint that Mac Lane gives us is to start with the pentagon identity, which normally involves four arbitrary objects, and replace the first two objects with the unit. The result is this commuting diagram:

It shows that the two ways of flipping the parentheses from 1 (1 (a b)) to ((1 1) a) b) are equivalent. As a reminder, the notation \alpha_{1 1 a} b at the bottom means: hold the rightmost b while applying \alpha to the inner tree. This is an example of whiskering.

The Right Unitor

The second hint is a bit confusing. Mac Lane asks us to add \rho in two places. But all the trees have the unit objects in the two leftmost positions, so surely he must have meant \lambda. I was searching for some kind of an online errata, but none was found. However, if you look closely, there are two potential sites where the right unitor could be applied, notwithstanding the fact that it has another unit to its left. So that must be it!

In both cases, we use the component \rho_1 \colon 1 \otimes 1 \to 1. In the first case, we hold the product (a \otimes b) unchanged. In the second case we whisker \rho_1 with a and then whisker the result with b.

Triangle Identity

The next hint tells us to use the triangle identity. Here’s this identity in diagram notation:
TriangleId
And here it is in tree notation:

Triangle
We interpret this as: if you have the unit in the middle, you can associate it to the right or to the left and then use the appropriate unitor. The result in both cases is the same.

It’s not immediately obvious where and how to apply this pattern. We will definitely have to do some squinting.

In the first occurrence of \rho in our pentagon, we have \rho_1 \otimes (a \otimes b). To apply the triangle identity, we have to do two substitutions in it. We have to use 1 as the left object and (a \otimes b) as the right object.

In the second instance, we perform a different trick: we hold the rightmost b in place and apply the triangle identity to the inner triple (1, 1, a).

Naturality

Keep in mind our goal:
The Problem
You can almost see it emerging in the upper left corner of the pentagon. In fact the three trees there are what we want, except that they are all left-multiplied by the unit. All we need is to connect the dots using commuting diagrams.

Focus on the two middle trees: they differ only by associativity, so we can connect them using \alpha_{1 a b}:

But how do we know that the quadrilateral we have just completed commutes? Here, Mac Lane offers another hint: use suitable naturalities.

In general, naturality means that the following square commutes:
Screenshot 2023-09-14 at 18.42.40
Here, we have a natural transformation \alpha between two functors F and G; and the arrow f \colon a \to b is lifted by each in turn.

Now compare this with the quadrilateral we have in our diagram:
Screenshot 2023-09-14 at 18.43.09
If you stare at these two long enough, you’ll discover that you can indeed identify two functors, both parameterized by a pair of objects a and b:

F_{a b} x = x (a b)

G_{a b} x = (x a) b

We get:
Screenshot 2023-09-14 at 18.42.59

The natural transformation in question is the associator \alpha_{x a b}. We are using its naturality in the first argument, keeping the two others constant. The arrow we are lifting is \rho_1 \colon 1 \otimes 1 \to 1. The first functor lifts it to \rho_1 (a b), and the second one to (\rho_1 a) b.

Thus we have successfully shrunk our commuting pentagon.

The Left Unitor

We are now ready to carve out another quadrilateral using the twice-whiskered left unitor 1 (\lambda_a b).

Again, we use naturality, this time in the middle argument of \alpha.
Nat2
The two functors are:

F_b x = 1 (x b)

G_b x = (1 x) b

and the arrow we’re lifting is \lambda_a.

The Shrinking Triangle

We have successfully shrunk the pentagon down to a triangle. What remains to reach our goal is now to shrink this triangle. We can do this by applying \lambda three times:

More Naturality

The final step is to connect the three vertices to form our target triangle.

This time we use the naturality of \lambda to show that the three quadrilaterals commute. (I recommend this as an exercise.)

Since we started with a commuting pentagon, and all the triangles and quadrilaterals that we used to shrink it commute, and all the arrows are reversible, the inner triangle must commute as well. This completes the proof.

Conclusion

I don’t think it’s possible to do category theory without drawing pictures. Sure, Mac Lane’s pentagon diagram can be written as an algebraic equation:

\alpha_{(a \otimes b) c d} \circ \alpha_{a b (c \otimes d)} =   (\alpha_{a b c} \otimes d) \circ \alpha_{a (b \otimes c) d} \circ (a \otimes \alpha_{b c d})

In programming we would call this point-free encoding and consider an aberration. Unfortunately, this is exactly the language of proof assistants like Lean, Agda, or Coq. No wonder it takes forever for mathematicians to formalize their theories. We really need proof assistants that work with diagrams.

Incidentally, the tools that mathematicians use today to publish diagrams are extremely primitive. Some of the simpler diagrams in this blog post were done using a latex plugin called tikz-cd, but to draw the more complex ones I had to switch to an iPad drawing tool called ProCreate, which is way more user friendly. (I also used it to make the drawing below.)


This post is based on the talk I gave at Functional Conf 2022. There is a video recording of this talk.

Disclaimers

Data types may contain secret information. Some of it can be extracted, some is hidden forever. We’re going to get to the bottom of this conspiracy.

No data types were harmed while extracting their secrets.

No coercion was used to make them talk.

We’re talking, of course, about unsafeCoerce, which should never be used.

Implementation hiding

The implementation of a function, even if it’s available for inspection by a programmer, is hidden from the program itself.

What is this function, with the suggestive name double, hiding inside?

x double x
2 4
3 6
-1 -2

Best guess: It’s hiding 2. It’s probably implemented as:

double x = 2 * x

How would we go about extracting this hidden value? We can just call it with the unit of multiplication:

double 1
> 2

Is it possible that it’s implemented differently (assuming that we’ve already checked it for all values of the argument)? Of course! Maybe it’s adding one, multiplying by two, and then subtracting two. But whatever the actual implementation is, it must be equivalent to multiplication by two. We say that the implementaion is isomorphic to multiplying by two.

Functors

Functor is a data type that hides things of type a. Being a functor means that it’s possible to modify its contents using a function. That is, if we’re given a function a->b and a functorful of a‘s, we can create a functorful of b‘s. In Haskell we define the Functor class as a type constructor equipped with the method fmap:

class Functor f where
  fmap :: (a -> b) -> f a -> f b

A standard example of a functor is a list of a‘s. The implementation of fmap applies a function g to all its elements:

instance Functor [] where
  fmap g [] = []
  fmap g (a : as) = (g a) : fmap g as

Saying that something is a functor doesn’t guarantee that it actually “contains” values of type a. But most data structures that are functors will have some means of getting at their contents. When they do, you can verify that they change their contents after applying fmap. But there are some sneaky functors.

For instance Maybe a tells us: Maybe I have an a, maybe I don’t. But if I have it, fmap will change it to a b.

instance Functor Maybe where
  fmap g Empty = Empty
  fmap g (Just a) = Just (g a)

A function that produces values of type a is also a functor. A function e->a tells us: I’ll produce a value of type a if you ask nicely (that is call me with a value of type e). Given a producer of a‘s, you can change it to a producer of b‘s by post-composing it with a function g :: a -> b:

instance Functor ((->) e) where
  fmap g f = g . f

Then there is the trickiest of them all, the IO functor. IO a tells us: Trust me, I have an a, but there’s no way I could tell you what it is. (Unless, that is, you peek at the screen or open the file to which the output is redirected.)

Continuations

A continuation is telling us: Don’t call us, we’ll call you. Instead of providing the value of type a directly, it asks you to give it a handler, a function that consumes an a and returns the result of the type of your choice:

type Cont a = forall r. (a -> r) -> r

You’d suspect that a continuation either hides a value of type a or has the means to produce it on demand. You can actually extract this value by calling the continuation with an identity function:

runCont :: Cont a -> a
runCont k = k id

In fact Cont a is for all intents and purposes equivalent to a–it’s isomorphic to it. Indeed, given a value of type a you can produce a continuation as a closure:

mkCont :: a -> Cont a
mkCont a = \k -> k a

The two functions, runCont and mkCont are the inverse of each other thus establishing the isomorphism Cont a ~ a.

The Yoneda Lemma

Here’s a variation on the theme of continuations. Just like a continuation, this function takes a handler of a‘s, but instead of producing an x, it produces a whole functorful of x‘s:

type Yo f a = forall x. (a -> x) -> f x

Just like a continuation was secretly hiding a value of the type a, this data type is hiding a whole functorful of a‘s. We can easily retrieve this functorful by using the identity function as the handler:

runYo :: Functor f => Yo f a -> f a
runYo g = g id

Conversely, given a functorful of a‘s we can reconstruct Yo f a by defining a closure that fmap‘s the handler over it:

mkYo :: Functor f => f a -> Yo f a
mkYo fa = \g -> fmap g fa

Again, the two functions, runYo and mkYo are the inverse of each other thus establishing a very important isomorphism called the Yoneda lemma:

Yo f a ~ f a

Both continuations and the Yoneda lemma are defined as polymorphic functions. The forall x in their definition means that they use the same formula for all types (this is called parametric polymorphism). A function that works for any type cannot make any assumptions about the properties of that type. All it can do is to look at how this type is packaged: Is it passed inside a list, a function, or something else. In other words, it can use the information about the form in which the polymorphic argument is passed.

Existential Types

One cannot speak of existential types without mentioning Jean-Paul Sartre.
sartre_22
An existential data type says: There exists a type, but I’m not telling you what it is. Actually, the type has been known at the time of construction, but then all its traces have been erased. This is only possible if the data constructor is itself polymorphic. It accepts any type and then immediately forgets what it was.

Here’s an extreme example: an existential black hole. Whatever falls into it (through the constructor BH) can never escape.

data BlackHole = forall a. BH a

Even a photon can’t escape a black hole:

bh :: BlackHole
bh = BH "Photon"

We are familiar with data types whose constructors can be undone–for instance using pattern matching. In type theory we define types by providing introduction and elimination rules. These rules tell us how to construct and how to deconstruct types.

But existential types erase the type of the argument that was passed to the (polymorphic) constructor so they cannot be deconstructed. However, not all is lost. In physics, we have Hawking radiation escaping a black hole. In programming, even if we can’t peek at the existential type, we can extract some information about the structure surrounding it.

Here’s an example: We know we have a list, but of what?

data SomeList = forall a. SomeL [a]

It turns out that to undo a polymorphic constructor we can use a polymorphic function. We have at our disposal functions that act on lists of arbitrary type, for instance length:

length :: forall a. [a] -> Int

The use of a polymorphic function to “undo” a polymorphic constructor doesn’t expose the existential type:

len :: SomeList -> Int
len (SomeL as) = length as

Indeed, this works:

someL :: SomeList
someL = SomeL [1..10]
> len someL
> 10

Extracting the tail of a list is also a polymorphic function. We can use it on SomeList without exposing the type a:

trim :: SomeList -> SomeList
trim (SomeL []) = SomeL []
trim (SomeL (a: as)) = SomeL as

Here, the tail of the (non-empty) list is immediately stashed inside SomeList, thus hiding the type a.

But this will not compile, because it would expose a:

bad :: SomeList -> a
bad (SomeL as) = head as

Producer/Consumer

Existential types are often defined using producer/consumer pairs. The producer is able to produce values of the hidden type, and the consumer can consume them. The role of the client of the existential type is to activate the producer (e.g., by providing some input) and passing the result (without looking at it) directly to the consumer.

Here’s a simple example. The producer is just a value of the hidden type a, and the consumer is a function consuming this type:

data Hide b = forall a. Hide a (a -> b)

All the client can do is to match the consumer with the producer:

unHide :: Hide b -> b
unHide (Hide a f) = f a

This is how you can use this existential type. Here, Int is the visible type, and Char is hidden:

secret :: Hide Int
secret = Hide 'a' (ord)

The function ord is the consumer that turns the character into its ASCII code:

> unHide secret
> 97

Co-Yoneda Lemma

There is a duality between polymorphic types and existential types. It’s rooted in the duality between universal quantifiers (for all, \forall) and existential quantifiers (there exists, \exists).

The Yoneda lemma is a statement about polymorphic functions. Its dual, the co-Yoneda lemma, is a statement about existential types. Consider the following type that combines the producer of x‘s (a functorful of x‘s) with the consumer (a function that transforms x‘s to a‘s):

data CoYo f a = forall x. CoYo (f x) (x -> a)

What does this data type secretly encode? The only thing the client of CoYo can do is to apply the consumer to the producer. Since the producer has the form of a functor, the application proceeds through fmap:

unCoYo :: Functor f => CoYo f a -> f a
unCoYo (CoYo fx g) = fmap g fx

The result is a functorful of a‘s. Conversely, given a functorful of a‘s, we can form a CoYo by matching it with the identity function:

mkCoYo :: Functor f => f a -> CoYo f a
mkCoYo fa = CoYo fa id

This pair of unCoYo and mkCoYo, one the inverse of the other, witness the isomorphism

CoYo f a ~ f a

In other words, CoYo f a is secretly hiding a functorful of a‘s.

Contravariant Consumers

The informal terms producer and consumer, can be given more rigorous meaning. A producer is a data type that behaves like a functor. A functor is equipped with fmap, which lets you turn a producer of a‘s to a producer of b‘s using a function a->b.

Conversely, to turn a consumer of a‘s to a consumer of b‘s you need a function that goes in the opposite direction, b->a. This idea is encoded in the definition of a contravariant functor:

class Contravariant f where
  contramap :: (b -> a) -> f a -> f b

There is also a contravariant version of the co-Yoneda lemma, which reverses the roles of a producer and a consumer:

data CoYo' f a = forall x. CoYo' (f x) (a -> x)

Here, f is a contravariant functor, so f x is a consumer of x‘s. It is matched with the producer of x‘s, a function a->x.

As before, we can establish an isomorphism

CoYo' f a ~ f a

by defining a pair of functions:

unCoYo' :: Contravariant f => CoYo' f a -> f a
unCoYo' (CoYo' fx g) = contramap g fx
mkCoYo' :: Contravariant f => f a -> CoYo' f a
mkCoYo' fa = CoYo' fa id

Existential Lens

A lens abstracts a device for focusing on a part of a larger data structure. In functional programming we deal with immutable data, so in order to modify something, we have to decompose the larger structure into the focus (the part we’re modifying) and the residue (the rest). We can then recreate a modified data structure by combining the new focus with the old residue. The important observation is that we don’t care what the exact type of the residue is. This description translates directly into the following definition:

data Lens' s a =
  forall c. Lens' (s -> (c, a)) ((c, a) -> s)

Here, s is the type of the larger data structure, a is the type of the focus, and the existentially hidden c is the type of the residue. A lens is constructed from a pair of functions, the first decomposing s and the second recomposing it.
SimpleLens

Given a lens, we can construct two functions that don’t expose the type of the residue. The first is called get. It extracts the focus:

toGet :: Lens' s a -> (s -> a)
toGet (Lens' frm to) = snd . frm

The second, called set replaces the focus with the new value:

toSet :: Lens' s a -> (s -> a -> s)
toSet (Lens' frm to) = \s a -> to (fst (frm s), a)

Notice that access to residue not possible. The following will not compile:

bad :: Lens' s a -> (s -> c)
bad (Lens' frm to) = fst . frm

But how do we know that a pair of a getter and a setter is exactly what’s hidden in the existential definition of a lens? To show this we have to use the co-Yoneda lemma. First, we have to identify the producer and the consumer of c in our existential definition. To do that, notice that a function returning a pair (c, a) is equivalent to a pair of functions, one returning c and another returning a. We can thus rewrite the definition of a lens as a triple of functions:

data Lens' s a = 
  forall c. Lens' (s -> c) (s -> a) ((c, a) -> s)

The first function is the producer of c‘s, so the rest will define a consumer. Recall the contravariant version of the co-Yoneda lemma:

data CoYo' f s = forall c. CoYo' (f c) (s -> c)

We can define the contravariant functor that is the consumer of c‘s and use it in our definition of a lens. This functor is parameterized by two additional types s and a:

data F s a c = F (s -> a) ((c, a) -> s)

This lets us rewrite the lens using the co-Yoneda representation, with f replaced by (partially applied) F s a:

type Lens' s a = CoYo' (F s a) s

We can now use the isomorphism CoYo' f s ~ f s. Plugging in the definition of F, we get:

Lens' s a ~ CoYo' (F s a) s
CoYo' (F s a) s ~ F s a s
F s a s ~ (s -> a) ((s, a) -> s)

We recognize the two functions as the getter and the setter. Thus the existential representation of the lens is indeed isomorphic to the getter/setter pair.

Type-Changing Lens

The simple lens we’ve seen so far lets us replace the focus with a new value of the same type. But in general the new focus could be of a different type. In that case the type of the whole thing will change as well. A type-changing lens thus has the same decomposition function, but a different recomposition function:

data Lens s t a b =
forall c. Lens (s -> (c, a)) ((c, b) -> t)

As before, this lens is isomorphic to a get/set pair, where get extracts an a:

toGet :: Lens s t a b -> (s -> a)
toGet (Lens frm to) = snd . frm

and set replaces the focus with a new value of type b to produce a t:

toSet :: Lens s t a b -> (s -> b -> t)
toSet (Lens frm to) = \s b -> to (fst (frm s), b)

Other Optics

The advantage of the existential representation of lenses is that it easily generalizes to other optics. The idea is that a lens decomposes a data structure into a pair of types (c, a); and a pair is a product type, symbolically c \times a

data Lens s t a b =
forall c. Lens (s -> (c, a))
               ((c, b) -> t)

A prism does the same for the sum data type. A sum c + a is written as Either c a in Haskell. We have:

data Prism s t a b =
forall c. Prism (s -> Either c a)
                (Either c b -> t)

We can also combine sum and product in what is called an affine type c_1 + c_2 \times a. The resulting optic has two possible residues, c1 and c2:

data Affine s t a b =
forall c1 c2. Affine (s -> Either c1 (c2, a))
                     (Either c1 (c2, b) -> t)

The list of optics goes on and on.

Profunctors

A producer can be combined with a consumer in a single data structure called a profunctor. A profunctor is parameterized by two types; that is p a b is a consumer of a‘s and a producer of b‘s. We can turn a consumer of a‘s and a producer of b‘s to a consumer of s‘s and a producer of t‘s using a pair of functions, the first of which goes in the opposite direction:

class Profunctor p where
  dimap :: (s -> a) -> (b -> t) -> p a b -> p s t

The standard example of a profunctor is the function type p a b = a -> b. Indeed, we can define dimap for it by precomposing it with one function and postcomposing it with another:

instance Profunctor (->) where
  dimap in out pab = out . pab . in

Profunctor Optics

We’ve seen functions that were polymorphic in types. But polymorphism is not restricted to types. Here’s a definition of a function that is polymorphic in profunctors:

type Iso s t a b = forall p. Profunctor p =>
  p a b -> p s t

This function says: Give me any producer of b‘s that consumes a‘s and I’ll turn it into a producer of t‘s that consumes s‘s. Since it doesn’t know anything else about its argument, the only thing this function can do is to apply dimap to it. But dimap requires a pair of functions, so this profunctor-polymorphic function must be hiding such a pair:

s -> a
b -> t

Indeed, given such a pair, we can reconstruct it’s implementation:

mkIso :: (s -> a) -> (b -> t) -> Iso s t a b
mkIso g h = \p -> dimap g h p

All other optics have their corresponding implementation as profunctor-polymorphic functions. The main advantage of these representations is that they can be composed using simple function composition.

Main Takeaways

  • Producers and consumers correspond to covariant and contravariant functors
  • Existential types are dual to polymorphic types
  • Existential optics combine producers with consumers in one package
  • In such optics, producers decompose, and consumers recompose data
  • Functions can be polymorphic with respect to types, functors, or profunctors

A PDF version of this post is available on github.

Abstract

Co-presheaf optic is a new kind of optic that generalizes the polynomial lens. Its distinguishing feature is that it’s not based on the action of a monoidal category. Instead the action is parameterized by functors between different co-presheaves. The composition of these actions corresponds to composition of functors rather than the more traditional tensor product. These functors and their composition have a representation in terms of profunctors.

Motivation

A lot of optics can be defined using the existential, or coend, representation:

\mathcal{O}\langle a, b\rangle \langle s, t \rangle = \int^{m \colon \mathcal M} \mathcal C (s, m \bullet a) \times \mathcal D ( m \bullet b, t)

Here \mathcal M is a monoidal category with an action on objects of two categories \mathcal C and \mathcal D (I’ll use the same notation for both actions). The actions are composed using the tensor product in \mathcal M:

n \bullet (m \bullet a) = (n \otimes m) \bullet a

The idea of this optic is that we have a pair of morphisms, one decomposing the source s into the action of some m on a, and the other recomposing the target t from the action of the same m on b. In most applications we pick \mathcal D to be the same category as \mathcal C.

Recently, there has been renewed interest in polynomial functors. Morphisms between polynomial functors form a new kind of optic that doesn’t neatly fit this mold. They do, however, admit an existential representation or the form:

\int^{c_{k i}} \prod_{k \in K} \mathbf{Set} \left(s_k,  \sum_{n \in N} a_n \times c_{n k} \right) \times \prod_{i \in K}  \mathbf{Set} \left(\sum_{m \in N} b_m \times c_{m i}, t_i \right)

Here the sets s_k and t_i can be treated as fibers over the set K, while the sets a_n and b_m are fibers over a different set N.

Alternatively, we can treat these fibrations as functors from discrete categories to \mathbf{Set}, that is co-presheaves. For instance a_n is the result of a co-presheaf a acting on an object n of a discrete category \mathcal N. The products over K can be interpreted as ends that define natural transformations between co-presheaves. The interesting part is that the matrices c_{n k} are fibrated over two different sets. I have previously interpreted them as profunctors:

c \colon \mathcal N^{op} \times \mathcal K \to \mathbf{Set}

In this post I will elaborate on this interpretation.

Co-presheaves

A co-presheaf category [\mathcal C, Set ] behaves, in many respects, like a vector space. For instance, it has a “basis” consisting of representable functors \mathcal C (r, -); in the sense that any co-presheaf is as a colimit of representables. Moreover, colimit-preserving functors between co-presheaf categories are very similar to linear transformations between vector spaces. Of particular interest are functors that are left adjoint to some other functors, since left adjoints preserve colimits.

The polynomial lens formula has a form suggestive of vector-space interpretation. We have one vector space with vectors \vec{s} and \vec{t} and another with \vec{a} and \vec{b}. Rectangular matrices c_{n k} can be seen as components of a linear transformation between these two vector spaces. We can, for instance, write:

\sum_{n \in N} a_n \times c_{n k} = c^T a

where c^T is the transposed matrix. Transposition here serves as an analog of adjunction.

We can now re-cast the polynomial lens formula in terms of co-presheaves. We no longer intepret \mathcal N and \mathcal K as discrete categories. We have:

a, b \colon [\mathcal N, \mathbf{Set}]

s, t \colon [\mathcal K, \mathbf{Set}]

In this interpretation c is a functor between categories of co-presheaves:

c \colon [\mathcal N, \mathbf{Set}] \to [\mathcal K, \mathbf{Set}]

We’ll write the action of this functor on a presheaf a as c \bullet a.

We assume that this functor has a right adjoint and therefore preserves colimits.

[\mathcal K, \mathbf{Set}] (c \bullet a, t) \cong [\mathcal N, \mathbf{Set}] (a, c^{\dagger} \bullet t)

where:

c^{\dagger} \colon [\mathcal K, \mathbf{Set}] \to [\mathcal N, \mathbf{Set}]

We can now generalize the polynomial optic formula to:

\mathcal{O}\langle a, b\rangle \langle s, t \rangle = \int^{c} [\mathcal K, \mathbf{Set}] \left(s,  c \bullet a \right) \times [\mathcal K, \mathbf{Set}] \left(c \bullet b, t \right)

The coend is taken over all functors that have a right adjoint. Fortunately there is a better representation for such functors. It turns out that colimit preserving functors:

c \colon [\mathcal N, \mathbf{Set}] \to [\mathcal K, \mathbf{Set}]

are equivalent to profunctors (see the Appendix for the proof). Such a profunctor:

p \colon \mathcal N^{op} \times \mathcal K \to \mathbf{Set}

is given by the formula:

p \langle n, k \rangle = c ( \mathcal N(n, -)) k

where \mathcal N(n, -) is a representable co-presheaf.

The action of c can be expressed as a coend:

(c \bullet a) k = \int^{n} a(n) \times p \langle n, k \rangle

The co-presheaf optic is then a coend over all profunctors p \colon \mathcal N^{op} \times \mathcal K \to \mathbf{Set}:

\int^{p} [\mathcal K, \mathbf{Set}] \left(s,  \int^{n} a(n) \times p \langle n, - \rangle \right) \times [\mathcal K, \mathbf{Set}] \left(\int^{n'} b(n') \times p \langle n', - \rangle, t \right)

Composition

We have defined the action c \bullet a as the action of a functor on a co-presheaf. Given two composable functors:

c \colon  [\mathcal N, \mathbf{Set}] \to [\mathcal K, \mathbf{Set}]

and:

c' \colon  [\mathcal K, \mathbf{Set}] \to [\mathcal M, \mathbf{Set}]

we automatically get the associativity law:

c' \bullet (c \bullet a) = (c' \circ c) a

The composition of functors between co-presheaves translates directly to profunctor composition. Indeed, the profunctor p' \diamond p corresponding to c' \circ c is given by:

(p' \diamond p) \langle n, m \rangle = (c' \circ c) ( \mathcal N(n, -)) m

and can be evaluated to:

(c' ( c ( \mathcal N(n, -))) m \cong \int^{k} c ( \mathcal N(n, -)) k \times p' \langle k, m \rangle

\cong \int^{k} p \langle n, k \rangle \times p' \langle k, m \rangle

which is the standard definition of profunctor composition.

Consider two composable co-presheaf optics, \mathcal{O}\langle a, b\rangle \langle s, t \rangle and \mathcal{O}\langle a', b' \rangle \langle a, b \rangle. The first one tells us that there exists a c and a pair of natural transformations:

l_c (s,  a ) = [\mathcal K, \mathbf{Set}] \left(s,  c \bullet a \right)

r_c (b, t) = [\mathcal K, \mathbf{Set}] \left(c \bullet b, t \right)

The second tells us that there exists a c' and a pair:

l'_{c'} (a,  a' ) = [\mathcal K, \mathbf{Set}] \left(a,  c' \bullet a' \right)

r'_{c'} (b', b) = [\mathcal K, \mathbf{Set}] \left(c' \bullet b', b \right)

The composition of the two should be an optic of the type \mathcal{O}\langle a', b'\rangle \langle s, t \rangle. Indeed, we can construct such an optic using the composition c' \circ c and a pair of natural transformations:

s \xrightarrow{l_c (s,  a )} c \bullet a \xrightarrow{c \,\circ \, l'_{c'} (a,  a')} c \bullet (c' \bullet a') \xrightarrow{assoc} (c \circ c') \bullet a'

(c \circ c') \bullet b' \xrightarrow{assoc^{-1}} c \bullet (c' \bullet b') \xrightarrow{c \, \circ \, r'_{c'} (b', b)} c \bullet b \xrightarrow{r_c (b, t)}  t

Generalizations

By duality, there is a corresponding optic based on presheaves. Also, (co-) presheaves can be naturally generalized to enriched categories, where the correspondence between left adjoint functors and enriched profunctors holds as well.

Appendix

I will show that a functor between two co-presheaves that has a right adjoint and therefore preserves colimits:

c \colon [\mathcal N, \mathbf{Set}] \to [\mathcal K, \mathbf{Set}]

is equivalent to a profunctor:

p \colon \mathcal N^{op} \times \mathcal K \to \mathbf{Set}

The profunctor is given by:

p \langle n, k \rangle = c ( \mathcal N(n, -)) k

and the functor c can be recovered using the formula:

c (a) k = \int^{n'} a (n') \times p \langle n', k \rangle

where:

a \colon [\mathcal N, \mathbf{Set}]

I’ll show that these formulas are the inverse of each other. First, inserting the formula for c into the definition of p should gives us p back:

\int^{n'} \mathcal N(n, -) (n') \times p\langle n', k \rangle \cong  p \langle n, k \rangle

which follows from the co-Yoneda lemma.

Second, inserting the formula for p into the definition of c should give us c back:

\int^{n'} a n' \times c(\mathcal N(n', -)) k  \cong c (a) k

Since c preserves all colimits, and any co-presheaf is a colimit of representables, it’s enough that we prove this identity for a representable:

a (n) = \mathcal N (r, n)

We have to show that:

\int^{n'}  \mathcal N (r, n')  \times  c(\mathcal N(n', -)) k \cong  c ( \mathcal N (r, -) ) k

and this follows from the co-Yoneda lemma.

Next Page »