Hacker News new | past | comments | ask | show | jobs | submit login
Category Theory ∩ Machine Learning (github.com/bgavran)
118 points by bgavran on March 7, 2023 | hide | past | favorite | 69 comments



I have recently written a paper on understanding machine learning via the lens of Hopf algebra https://arxiv.org/abs/2302.01834.

Hopf algebras (which are really just tensors with recurrence relations built in) subsume convnets, transformers and diffusion model and also provide a theoretically better autodiff that operates within single layers as opposed to across entire graphs.

Furthermore, there is a correspondence between Hopf algebra and cyclical linear logic and Hopf algebras are related to zonotopes, which are polyhedra that have been used in verified numerical computation. I'm strongly convinced the LL connection can provide proofs over zonotopes which paves the way towards interpretable AI and will be central for XAI.

I know this sounds too good to be true but Persi Diaconis has also written a paper that shows how useful Hopf algebras are in the context of Markov chains https://arxiv.org/abs/1206.3620

I'm working on a next gen Hopf algebra based machine learning framework.

Join my discord if you want to discuss this further https://discord.cofunctional.ai.

====

My account is currently rate limited so I will use this comment to respond to comments below.

red_trumped: What about Hopf algebras do I not understand?

gaze: Haha, it's been a while since I have commented about QC. What do I not understand about it? And what comment are you referring to?


Your paper didn't pass my smell test at all, tbh. For example the formula you write about "product" and "coproduct" in section 3 is literally identical (as "=" is symmetric). In section 4.2 you write "the product is the standard tensor product" with a formula that doesn't at all involve the map m: A \otimes A \to A. The formula you write is the induced product on A \otimes A, assuming that you already have a product on A. The formula for "coproduct" is just an example[1] of a coproduct, not every coproduct has to look that way.

[1] https://en.wikipedia.org/wiki/Coalgebra#Examples


Same here. Sloppily written with very little content. If the author can't take the time to proofread his own paper why should anyone else waste their time?


Ok read the Persi Diaconis one https://arxiv.org/abs/1206.3620.

You are right, it can be more polished but also it covers quite a lot and it was surprisingly hard to fit it all in.

The main response I have been getting though is "I won't believe this until I see the implementation" so I have been concentrating on that.


adamnemecek has posted too many comments and is in cooldown phase, but he's asked me to post this comment: "It's the programmers equal sign. I think that the surrounding text provides a decent explanation what the deal is.

You are right, there's a missing sentence fragment, "standard tensor product that satisfies the property...".

Read the Diaconis paper. "

--- This isn't a sock puppet and I hope this isn't against site rules. I just wanted to try and help facilitate good discussion. I think trumpet brought up some interesting criticisms and felt Adam had a legitimate interest in responding ASAP.


> It's the programmers equal sign.

That doesn’t seem to make any sense.


I'm saying there's a difference between equal sign that defines something and equal sign that declares a relation.


Sure. In mathematics this is why we often use ":=" for definitions, or we indicate in the surrounding text that the next equation is a definition. That would be helpful.

But even then, you cannot define "C" as "C \otimes C", because the right hand side only makes sense if "C" is already defined. And in math you cannot define something twice. As soon as you defined something, it stays the same in the given context.


It's a type signature, not a formula. Cs are not values.

Think of it as a signature in say Rust

fn coproduct(in: C) -> (C, C)

Also you were talking about my knowledge of Hopf algebras, this is not knowledge of Hopf algebra but quibbling about things that are pretty clear from context.


I don't know what prerequisite knowledge is implied for that paper, but for someone who had to dabble in theoretical econometric and ML papers during my graduate studies it is absolutely not clear what is going on. In my experience, a paper written in such fashion where you don't even define what objects you are working with wouldn't even pass as an acceptable undergraduate paper.

For example, if someone works with a probability space (Ω, F, P), they would state so very clearly in their paper even though it is quite obvious from the notation that it is supposed to be a probability space.

Similarly, if someone writes "Given a symmetric monoidal category C with tensor product ⊗...", it is understandable what is going on, or at least it is understandable what I should look up. But in that paper I have no idea how I am supposed to interpret "C", "⊗" and "=" in such a way that the formulas make sense.

> It's a type signature, not a formula. C is not values.

I am not sure what distinction you are trying to draw, but surely any sequence of symbols that adheres to a given formal grammar is a (well-formed) formula; and surely if by value you mean a mathematical object, types and type signatures are values.


> In my experience, a paper written in such fashion where you don't even define what objects you are working with wouldn't even pass as an acceptable undergraduate paper.

Don't read it, it's apparently too stressful for you. You are right, I assume some knowledge but I don't know if this discourse is productive, you are too hung up on things that don't matter.

C is general. Coproduct is a general construct.

Read the Diaconis paper first, you might get what I'm getting at.


I suggest you to re-read Diaconis paper, you might get what I’m getting at. It looks like a proper math paper with proper exposition.


It seems fine to me, although maybe it's backwards. Add a prime to the RHS, perhaps.


could you advertise your research a bit less often, please? i see your post like literally almost every other day here


Or at least explain it in more accessible way. Every time Adam posts about the paper, it gets confused comments and no engagement on the content, because it's pretty deep graduate level pure math, which is occasionally seen but rare on HN.


As a maths PhD student that has seen Hopf algebras before (though I'm no expert, and the context was different), I'm not convinced Adam understands things about Hopf algebras.


He doesn’t understand quantum computing either. I wish he would stop writing authoritatively about things he doesn’t understand.


I was kind of curious what all the controversy is about, but looking back at that users comments, it doesn't seem like he has even commented once in the last six months about QC, unless i missed it (?)


At some point I said that analog QC will be the future rather than what's considered QC today. I still stand by those comments.


What about QC do I not understand? It's been a while since I have made any comments about QC so I'm glad that those comments made an impression.


I don’t think they made the kind of impression you wanted


Ok go ahead what do I not understand? Can you at least tell me which comments you are talking about?


Go on what do I not understand about Hopf algebras?


The discord channel is popping off. Come join, https://discord.cofunctional.ai I'll explain it


Intersting papers.

https://arxiv.org/abs/2302.01834 appears to have a typo in section 4.5

S(hg) = S(g)S(g)

looks like it should be S(hg) = S(h)S(g) or S(hg) = S(g)S(h)


Right thanks.


I just read your Coinductive guide to inductive transformer heads paper.

My mind is blown.

Is the Hopf Algebra based ML framework you are working on on your github? I took a glance, but you have 1500 repositories and it wasn't on the first few of them.


It's in very early stages and it's not there yet no. Join the discord https://discord.cofunctional.ai or my twitter https://twitter.com/adamnemecek1 if you want to follow progress. It might take some time.


It is tempting to believe that category theory will shed new light on and simplify machine learning, just like it did in algebraic geometry, algebraic topology and other mathematical things. This is wishful thinking. Folks who care about doing something useful should stay away from this content.


I'd suggest providing some justification for your declaration if you want anyone to listen to you.


That's a supremely anti-intellectual take.


> Category Theory has been finding increasing applications in machine learning

What's the most compelling application so far?


Application in the sense they are using it is probably different than the sense you are using it. Although its still probably a fair question regardless.


Any application where Category Theory is making it substantially easier to express the software or reason about it is fair game, from my perspective.



Is this simply a consequence of exponential growth in CS publications driven by machine learning or is there something really going on here?


OP here.

The exponential growth in CS publication is much faster. This repository is simply a testament that CT is slowly ramping up.

It's meant to show what kind of expressive power and breadth current CT models have, which to my knowledge isn't something that's well-known outside of our niche community.


It's also meant to suggest where things are going (the kind of a chart I have in mind is this one https://twitter.com/bgavran3/status/1422206118688956420 ), though I understand this is something that deserves a much more substantial proof.


>The exponential growth in CS publication is much faster.

So ... yes?


The field needs better foundations. CT is pretty good.


Why? How?

The OP GitHub site doesn't promote any material that introduces the concepts at all. The "survey" paper at the top is nigh-impenetrable. I'm sure the category theorists are having fun modelling machine learning, but it doesn't show how machine learning benefits from the category theory.


Category Theory (just as all mathematical models for programming or subsets thereof) are building blocks for reasoning on what we build. Past applications of such mathematical models include:

- programming languages with semantics that are better adapted to specific problems (e.g. Rust's ownership);

- better compilers (see e.g. Haskell's supercompiler, which puts to shame `constexpr`-style features);

- better static analyzers (e.g. better type systems, abstract interpretation, model checkers).

In the case of Machine Learning, it might some day help us create Machine Learning that we can understand and trust better. Or it might fail. Or it might help us invent something different entirely, in 30 years.


Intuition from category theory has certainly been useful for some programming language features (eg. algebraic effects, modal types or lenses), but I don't think it has helped with the things you cite. Rusts ownership model was created with little formal, academic understanding - or do you mean more foundational stuff like linear logic? And I don't think that Haskell does supercompilation today? It does optimize a whole lot but that is more due to laziness and other optimizations. But neither these optimizations nor supercompilation have much to do with category theory as far as I am aware.


Yeah, my phrasing is probably unclear. I meant to give examples of applications of formal semantics (huddling everything together). I should probably sleep instead of commenting on HN :)

But it is my understanding that Rust's ownership traces its roots back to both linear logic (well, affine logic) and region-based resource management, both of which have formal semantics.

I haven't followed what got merged into GHC, but I remember seeing demos (and a paper) of a Haskell supercompiler during a conference many years ago, so it is something.

In my sleep-deprived brain, supercompilation is directly related to algebraic effects (although probably not in Haskell itself), which are themselves related to category theory. I could be wrong.


why those approaches never picked up outside of some academia projects?..


These days, Microsoft requires model-checking proofs before accepting new device drivers. That's their secret weapon that finally got (mostly) rid of the BSOD.

I suspect that Apple is also using model-checking at various layers, but I have no proof :)


and how does it work? Do they write drivers in some verifiable subset of C?


That's how I understand it. I haven't checked the details.


There's a well-articulated essay from Christopher Olah that captures the undercurrent, and perhaps even the motivation of many of these papers https://colah.github.io/posts/2015-09-NN-Types-FP/

You're completely right that the repo doesn't promote any introductory material. CT is notoriously difficult to get into, and this repository wasn't meant to be a pedagogical one, but rather a list of all the relevant papers.

Though, I'll see about remedying this. I have a different repository that curates a list of (what I consider to be, as a CS major) relatively approachable CT introductory materials https://github.com/bgavran/Category_Theory_Resources

and it might be a good idea to add a pointer to it.


Olah's blog entry is about types and algebra. We've had types and algebra for a long time distinct Category Theory, which is a further abstraction over algebraic structures.

A lot of computer programmers who never studied Abstract Algebra or Type Theory before think that all of that is Category Theory because Haskell is their first exposure to all that, and Category Theory is a big meme in Haskell, thanks to Monad.


Residual connection serves as a feedback/trace a la trace in traced monoidal categories. That is one insight I have gleaned from CT.


You beat me to this. Indeed, of all categories the monoidal ones are the most potent. Look how nicely they fit in crypto ledgers: https://www.cl.cam.ac.uk/events/syco/3/slides/Nester.pdf

\s


No. It won't make a significant (if any at all) difference to effectiveness. Rewriting Pytorch in Haskell won't magically get you AGI.


No one was suggesting rewriting anything in Haskell afaict..


It's not about rewriting things in Haskell but about using CT to reason about architectures.


You wouldn't expect improved foundations to increase effectiveness in the short term.

And AGI is totally irrelavent here.


I'm not experienced/well-read in either ML or CT, but awhile ago I remember hearing Tai-Danae Bradley equate "knowing a word by the company it keeps" to the Yoneda lemma, and I always thought that was kind of interesting (although I guess I'm not qualified enough to know whether that statement is useful or vacuous)


> knowing a word by the company it keeps

I'm still shocked no one has developed language learning software along these lines. I had a prototype in the works for thai years ago but never got time to get it off the ground. using statistical models trained on web corpus for a language learning app seems like a no brainer.

think of it like navigating a word as a point in a graph connected to every example context it is in, with associated words being clickable into similar context bundles. then make it differential between host and target language given a translation so you can see which contexts the translation fails and succeeds in.


category theory is 'native 2-dimensional' math. i.e. category theory explains everything in terms of graphs, where a graph is made from two different sorts of 'entities', nodes and vertices i.e. categories and morphisms

this being math, I wonder to which extent can category theory be re-expressed in terms of sets.

perhaps a better question is if category theory can be re-expressed (or founded on) functions?

lastly, I wonder if category theory can be expressed in terms of functions (i think maybe it can, without sets?) why shouldn't it be expressible in terms of sets (for some reason I don't think just sets are sufficient, may have to define functions (which possible in terms of sets) before 'expressing' categories starting with set theory)?


Set theory is fine, see the (Stack Project)[https://stacks.math.columbia.edu/browse] which develops a ton of modern Category Theory on ZFC (Zermelo-Fraenkel Set Theory with the Axiom of Choice) alone.

Alternative foundations of mathematics (Set Theory, Category Theory, Type Theory, and all their variations) can all mutually interpret the other by just postulating sufficiently large universes. You don't pick or advocate one based off its ability to encode mathematics, but instead based on its ability to express your intention and ideas.

Really its no different from programming language preference in my book.


In computer science we typically count from zero, not from one. So even though a category has two different sorts of entities: objects and morphisms, since we count from zero, ordinary categories are one-dimensional. Objects are zero dimensional and morphisms are one dimensional.

Since you are concerned with sets and functions, the following analogy is helpful to build your intuition for the subject.

Dimension 0: sets

Dimension 1: functions

Dimension 2: commutative squares

Dimension 3: commutative cubes

The majority of categories, expressed in terms of structured sets and functions, never touch on the second dimension. That is the domain of 2-categories, which although they have three types of elements are nonetheless considered to be two dimensional because we count from zero. That is also where commutative squares come in to play, because as you can imagine squares are quite obviously two dimensional.

Functions are 1-dimensional lines or arrows from one place to another. Sets are more analogous to points then anything else, and so naive set theory is zero dimensional. But I think you have the wrong question. You should ask the opposite question: what if functions can be re-expressed or founded on higher dimensional category theory?


> for some reason I don't think just sets are sufficient

The reason you're looking for is that the category of sets is not a set.


This is routinely dealt with through Grothendieck universes. Those are a fancy name for what is pretty much an inaccessible level of the cumulative hierarchy, indeed ZFC+"every set belongs to a Grothendieck universe" is equiconsistent with ZFC+"there is a proper class of inaccessible cardinals". This is not a strong assumption over pure ZFC compared to those set theorists interested in large cardinals work with


While category of sets technically could not be expressed as a ZFC set, the idea behind the set theory is enough. Also you could add an axiom[0] in ZFC to make category of set a set.

[0]: https://en.wikipedia.org/wiki/Grothendieck_universe


> i.e. categories and morphisms

Objects, not categories.


Set theory is going the way of the dodo. There are modified versions of set theory, but afaik a lot of the more exciting work is happening around type theory and proof assistants these days.


Set theory is not going anywhere. It may be pushed to the sidelines by new developments in category theory but that is not the same as going extinct like the dodo.


> I wonder to which extent can category theory be re-expressed in terms of sets...

yoneda


This also requires universes/inaccessible cardinals to even be stated for categories that are not locally small. But as I mentioned in another comment assuming enough universes exist is not a big deal for set theorists




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: