tinygrad — The Hackable Compiler Framework¶

Parent: Module 2 — Deep Learning Frameworks

A minimal DL framework (~10K lines) that exposes the entire compiler pipeline in readable Python. The ideal codebase for understanding what happens between loss.backward() and the GPU kernel that actually runs.

Why tinygrad Is Uniquely Valuable¶

It's the inference engine inside openpilot (Phase 5E)
It exposes the IR, scheduler, and code generation that Phase 4C teaches you to build
You can add a custom backend (Phase 4C §7) targeting your own accelerator
It runs on CUDA, OpenCL, Metal, LLVM, and custom targets

Key Concepts¶

Concept	What it teaches	Stack connection
Lazy evaluation	Nothing runs until `.realize()`	L2: compiler decides when to execute
3 operation types	Elementwise, Reduce, Movement (25 primitives)	L5: what the PE array must support
ShapeTracker	Zero-copy reshapes and transposes	L2: memory layout optimization
UOp IR	Intermediate representation before codegen	L2: same concept as MLIR/TVM IR
BEAM search	Explores fusion choices to minimize runtime	L2: auto-tuning
Backends	Same IR generates CUDA, OpenCL, LLVM code	L2: multi-target compilation

Projects¶

Trace a matmul: Tensor → lazy buffer → scheduled ops → generated CUDA kernel. Document every step.
DEBUG=4: Run a small model, read the generated kernels. Count kernel launches.
BEAM=3 vs BEAM=0: Compare kernel count and latency on the same model.
Logging backend: Add a minimal hook that prints each kernel launch.

Deep Dive¶

The full tinygrad learning path (11 parts, 7 projects) is in Phase 5E — Autonomous Vehicles / tinygrad.

tinygrad — The Hackable Compiler Framework¶

Why tinygrad Is Uniquely Valuable¶

Key Concepts¶

Projects¶

Deep Dive¶

Resources¶