tinygrad — The Hackable Compiler Framework¶
Parent: Module 2 — Deep Learning Frameworks
A minimal DL framework (~10K lines) that exposes the entire compiler pipeline in readable Python. The ideal codebase for understanding what happens between
loss.backward()and the GPU kernel that actually runs.
Why tinygrad Is Uniquely Valuable¶
- It's the inference engine inside openpilot (Phase 5E)
- It exposes the IR, scheduler, and code generation that Phase 4C teaches you to build
- You can add a custom backend (Phase 4C §7) targeting your own accelerator
- It runs on CUDA, OpenCL, Metal, LLVM, and custom targets
Key Concepts¶
| Concept | What it teaches | Stack connection |
|---|---|---|
| Lazy evaluation | Nothing runs until .realize() |
L2: compiler decides when to execute |
| 3 operation types | Elementwise, Reduce, Movement (25 primitives) | L5: what the PE array must support |
| ShapeTracker | Zero-copy reshapes and transposes | L2: memory layout optimization |
| UOp IR | Intermediate representation before codegen | L2: same concept as MLIR/TVM IR |
| BEAM search | Explores fusion choices to minimize runtime | L2: auto-tuning |
| Backends | Same IR generates CUDA, OpenCL, LLVM code | L2: multi-target compilation |
Projects¶
- Trace a matmul:
Tensor→ lazy buffer → scheduled ops → generated CUDA kernel. Document every step. - DEBUG=4: Run a small model, read the generated kernels. Count kernel launches.
- BEAM=3 vs BEAM=0: Compare kernel count and latency on the same model.
- Logging backend: Add a minimal hook that prints each kernel launch.
Deep Dive¶
The full tinygrad learning path (11 parts, 7 projects) is in Phase 5E — Autonomous Vehicles / tinygrad.