AI Hardware Engineer Roadmap¶

From firmware & AI applications to ML compilers — and ultimately, custom silicon.

AI Hardware Engineer Roadmap

A custom AI chip is an 8-layer vertical stack. This roadmap builds vertical literacy across all 8 layers — from AI applications and ML compilers down to RTL design and silicon fabrication.

The 8-Layer AI Chip Stack¶

Layer	What it does	Key technologies
L1	AI Application & Framework	PyTorch, ONNX, Agentic AI, MLOps, quantization
L2	Compiler & Graph Optimization	MLIR dialects, TVM, LLVM, custom NPU lowering
L3	Runtime & Driver	C++ runtime, Linux kernel driver, CUDA-like API
L4	Firmware & OS	FreeRTOS, bootloader, embedded Linux, RTOS
L5	Hardware Architecture	Systolic arrays, HBM controllers, NoC, power domains
L6	RTL & Logic Design	SystemVerilog, UVM verification, FPGA prototyping
L7	Physical Implementation	EDA tools, place & route, timing closure
L8	Fabrication & Packaging	Foundry process, CoWoS, post-silicon bring-up

L1–L6: Hands-on projects throughout the curriculum | L7–L8: Theory and guided labs (OpenROAD, TinyTapeout)

How Phases Map to Layers¶

               L1          L2          L3          L4          L5          L6          L7          L8
            Application  Compiler   Runtime &   Firmware    Hardware      RTL &      Physical     Fab &
            & Framework  & Graph     Driver      & OS      Architecture   Logic      Implement.  Packaging
            ───────────┬───────────┬───────────┬───────────┬───────────┬───────────┬───────────┬───────────
Phase 1        ░░░     │           │    ░░░    │    ░░░    │    ███    │    ███    │           │
Phase 2                │           │    ░░░    │    ███    │           │           │           │     ░
Phase 3        ███     │           │           │           │           │           │           │
Phase 4A               │           │    ███    │           │    ██░    │    ███    │           │
Phase 4B               │           │    ███    │    ██░    │    ░░░    │           │           │     ░
Phase 4C       ░░░     │    ███    │    ░░░    │           │           │           │           │
Phase 5        ░░░     │    ░░░    │    ░░░    │           │    ███    │    ██░    │    ██░    │    ░░░
            ───────────┴───────────┴───────────┴───────────┴───────────┴───────────┴───────────┴───────────

███ = primary coverage    ██░ = strong supporting    ░░░ = background    (blank) = minimal

5-Phase Curriculum¶

Phase 1: Digital Foundations (6–12 months)¶

The language of hardware — from gates and Verilog to CUDA kernels.

Topic	Key Skills	Layer
Digital Design and HDL	Number systems, logic, memory; Verilog, testbenches, synthesis	L6
Computer Architecture	ISA, pipelines, caches, OoO, coherence; modern CPUs/GPUs/memory	L5
Operating Systems	Processes, threads, scheduling, memory management, drivers	L3/L4
C++ and Parallel Computing	C++ & SIMD · OpenMP & oneTBB · CUDA & SIMT · ROCm & HIP · OpenCL & SYCL	L1/L3

Phase 2: Embedded Systems (6–12 months)¶

The boards and buses that sit next to inference — MCUs, RTOS, and embedded Linux.

Topic	Key Skills	Layer
Embedded Software	ARM Cortex-M, FreeRTOS, SPI/UART/I2C/CAN, power, OTA	L4
Embedded Linux	Yocto, PetaLinux, kernel, rootfs	L3/L4

Phase 3: Artificial Intelligence (6–12 months)¶

The workloads your hardware must run. Core + two tracks. · Hub: Phase 3 — Artificial Intelligence

Core (mandatory):

#	Topic	Layer
1	Neural Networks — MLPs, CNNs, training, backpropagation	L1
2	Deep Learning Frameworks — micrograd → PyTorch → tinygrad	L1/L2

Track A — Hardware & Edge AI (→ Phase 4):

#	Topic	Layer
3	Computer Vision — detection, segmentation, 3D vision, OpenCV	L1
4	Sensor Fusion — camera/LiDAR/IMU, Kalman, BEVFusion	L1
5	Voice AI — STT (Whisper), TTS (Piper), VAD, keyword spotting	L1
6	Edge AI & Model Optimization — quantization, pruning, deployment	L1

Track B — Agentic AI & ML Engineering (→ Phase 5 HPC/GenAI):

#	Topic	Layer
3	Agentic AI & GenAI — LLM agents, RAG, tool use	L1
4	ML Engineering & MLOps — training pipelines, model serving	L1
5	LLM Application Development — fine-tuning, RAG architecture	L1

Phase 4: Hardware Deployment & Compilation (6–12 months each)¶

Deploy AI on real silicon and learn how compilers bridge models to hardware.

Track A — Xilinx FPGA¶

Topic	Key Skills	Layer
Xilinx FPGA Development	Vivado, IP, timing, ILA/VIO	L6
Zynq UltraScale+ MPSoC	PS/PL, Linux on Zynq	L5/L6
Advanced FPGA Design	CDC, floorplanning, power, PR	L6
HLS	C→RTL, dataflow, pipelining	L5/L6
Runtime & Drivers	XRT, DMA, kernel drivers, Vitis AI/FINN	L3
Projects	1080p→4K wireless video (VCU, MIPI, TDMA)	L3–L6

Track B — NVIDIA Jetson¶

Topic	Key Skills	Layer
Jetson Platform	Orin Nano, JetPack 6.2.2, L4T, CUDA	L3
Carrier Board	Schematic, PCB, thermal, bring-up	L5
L4T Customization	Rootfs, kernel/DT, OTA	L3/L4
FSP Firmware	FreeRTOS on SPE/AON	L4
App Development	ML/AI, ROS 2, multimedia	L1/L3
Security & OTA	Secure boot, OP-TEE, A/B OTA	L4
Manufacturing	FCC/CE, DFM, production flash	L8
Runtime & Drivers	CUDA runtime, TensorRT, DLA, DeepStream	L3

Track C — ML Compiler¶

Part	Key Skills	Layer
Compiler Fundamentals	Graph IR, LLVM, MLIR, TVM/tinygrad, custom backends	L2
DL Inference Optimization	Triton, CUTLASS, Flash-Attention, TensorRT-LLM, quantization	L1/L2

Phase 5: Specialization Tracks (ongoing)¶

Track	Focus	Guide
A: GPU Infrastructure	Nvidia GPU (NCCL, NVLink) · AMD GPU (ROCm, HIP, MI300X)	Guide →
B: HPC	CUDA-X Libraries: 40+ GPU-accelerated libraries	Guide →
C: Edge AI	Efficient nets, quantization, Holoscan, real-time pipelines	Guide →
D: Robotics	Nav2, MoveIt, planning	Guide →
E: Autonomous Vehicles	openpilot, tinygrad, BEV, safety, Lauterbach	Guide →
F: AI Chip Design	Systolic arrays, dataflow, tinygrad↔hardware, ASIC path	Guide →

Layer → Job Title Quick Reference¶

Layer	Job titles	Primary phases
L1	ML Inference Eng · Edge AI Eng · Agentic AI Eng · GenAI Eng · MLOps Eng	Phase 3, 4B+C
L2	AI Compiler Eng · Graph Optimization Eng · Kernel Eng	Phase 4C
L3	GPU Runtime Eng · Linux Kernel Eng · Embedded Linux BSP Eng	Phase 4A§5, 4B§8
L4	Firmware Eng · Embedded SW Eng · Embedded Linux Eng · IoT Eng	Phase 2, 4B§4
L5	AI Accelerator Architect · SoC Platform Eng	Phase 1§2, 4A, 5F
L6	RTL Design Eng · FPGA Eng · DV Eng	Phase 1§1, 4A
L7	Theory: OpenROAD, GDS flow	Phase 5F
L8	Theory: chiplets, CoWoS, TinyTapeout	Phase 5F

Cross-layer roles: - Autonomous Vehicles HW/SW Engineer — L1 through L4 (Phase 4B + Phase 5E) - AI Hardware Engineer (Full-Stack) — L1 through L6 (the signature role this roadmap targets)

Reference Projects¶

Project	What it teaches	Used in
tinygrad	Minimal DL framework — IR, scheduler, BEAM, backends	Phase 3, 4C, 5E
openpilot	Production ADAS — camera→ISP→modeld→planning→CAN	Phase 4B, 5E

Additional Resources¶

Roles & Market Analysis — 23 sub-layers, salary data, job postings, remote %, hiring priorities

A community-driven educational roadmap for AI hardware engineering. · Star ⭐ if you find this useful