This is a port of Andrej Karpathy’s llm.c project (the CPU version). I
toyed around with it the day he released the initial version for a
couple of hours, but only continued with it on a train / plane trip I
had last weekend (<2024-04-20 Sat>). In particular we started from
commit a22c22b
. In particular this means the tokenizer is not there
at the moment. I might add it one of these days.
Performance is ~comparable to the C version.
Note: the port was done in a bit of a hurry, so who knows what bugs lurk compared to the original! :)
- We have a very shallow abstraction of the raw pointer buffer
interface from C in the form of a
MView[T]
type (which is just aptr UncheckedArray[T]
in Nim lang + a few goodies, notably a{}
accessor to do pointer arithmetic for a more ‘natural’ access to another buffer. - We use Nim’s CT features to automatically assign the correct buffer
views for the
*Tensor
fields based onfieldPairs
, their order in theobject
and theparams/act_sizes
input. - Generally less pointer handling.
- We use destructors for the
GPT2
andDataLoader
objects so that we don’t have to free manually (and copying these is disallowed) - Instead of relying on OpenMP’s
collapse
primitive to fuse multiple nested loops, we use a custom Nim CT based loop fusion macro, see ./fuse_loops.nim. The issue is that because Nim convertsfor
loops intowhile
statements, it doesn’t play nice with nested loops for OpenMP. :) So I wrote a macro that wraps around for loops and performs the loop fusion manually (note that it only works forfor i in 0 ..< X
style loops (i.e. lower index 0 and using..<
).
We have to compile with --exceptions:quirky
, because otherwise the
inserted Nim error checks break the OpenMP compilation. We could
disable checks locally in the code, but for this here it’s fine.
See also: nim-lang/Nim#23311
The important compilation arguments are defined in a local nim.cfg
and at the top of the train_gpt2.nim
file (fast-math and OpenMP
related).
Otherwise just compile with:
nim c -d:danger -d:openmp -d:lto --passC:"-march=native" train_gpt2.nim
Otherwise follow the CPU instructions from the original repo to get started: https://github.com/karpathy/llm.c?tab=readme-ov-file#quick-start-cpu
Similar to my Nim port of his llama2.c, I had time to kill on a trip! And doing such ‘dumb’ ports is kind of meditative… lol