"A from-scratch implementation of a modern Large Language Model (LLaMA architecture). Features full backpropagation calculus in pure NumPy, followed by a GPU-accelerated PyTorch training loop."
-
Updated
Jun 7, 2026 - Python
"A from-scratch implementation of a modern Large Language Model (LLaMA architecture). Features full backpropagation calculus in pure NumPy, followed by a GPU-accelerated PyTorch training loop."
A from-scratch PyTorch LLM implementing Sparse Mixture-of-Experts (MoE) with Top-2 gating. Integrates modern Llama-3 components (RMSNorm, SwiGLU, RoPE, GQA) and a custom-coded Byte-Level BPE tokenizer. Pre-trained on a curated corpus of existential & dark philosophical literature.
Add a description, image, and links to the rope-embeddings topic page so that developers can more easily learn about it.
To associate your repository with the rope-embeddings topic, visit your repo's landing page and select "manage topics."