Skip to content

Tiny LLM from Scratch

Build a small language model directly on your laptop. Collect data, train a tokenizer, hand-code a transformer, and run a 10M-parameter model end-to-end in under four hours — then quantize and serve it with llama.cpp.

What this book covers / doesn't cover

Covered

nanoGPT-style transformer · BPE · TinyStories/Cosmopedia · AdamW · mixed precision · perplexity · GGUF · llama.cpp

Mentioned only

RoPE · RMSNorm · SwiGLU · GQA · KV cache · LoRA

Out of scope

MoE · RLHF · DPO/GRPO · multi-node · FSDP · 70B+ scale

Prerequisites

Python · intro PyTorch · matrix-multiply intuition · Colab or M1+ Mac

Where to go