Skip to content

AI Hardware Engineer Roadmap

From firmware & AI applications to ML compilers — and ultimately, custom silicon.

AI Hardware Engineer Roadmap

A custom AI chip is an 8-layer vertical stack. This roadmap builds vertical literacy across all 8 layers — from AI applications and ML compilers down to RTL design and silicon fabrication.


The 8-Layer AI Chip Stack

Layer What it does Key technologies
L1 AI Application & Framework PyTorch, ONNX, Agentic AI, MLOps, quantization
L2 Compiler & Graph Optimization MLIR dialects, TVM, LLVM, custom NPU lowering
L3 Runtime & Driver C++ runtime, Linux kernel driver, CUDA-like API
L4 Firmware & OS FreeRTOS, bootloader, embedded Linux, RTOS
L5 Hardware Architecture Systolic arrays, HBM controllers, NoC, power domains
L6 RTL & Logic Design SystemVerilog, UVM verification, FPGA prototyping
L7 Physical Implementation EDA tools, place & route, timing closure
L8 Fabrication & Packaging Foundry process, CoWoS, post-silicon bring-up

L1–L6: Hands-on projects throughout the curriculum  |  L7–L8: Theory and guided labs (OpenROAD, TinyTapeout)


How Phases Map to Layers

               L1          L2          L3          L4          L5          L6          L7          L8
            Application  Compiler   Runtime &   Firmware    Hardware      RTL &      Physical     Fab &
            & Framework  & Graph     Driver      & OS      Architecture   Logic      Implement.  Packaging
            ───────────┬───────────┬───────────┬───────────┬───────────┬───────────┬───────────┬───────────
Phase 1        ░░░     │           │    ░░░    │    ░░░    │    ███    │    ███    │           │
Phase 2                │           │    ░░░    │    ███    │           │           │           │     ░
Phase 3        ███     │           │           │           │           │           │           │
Phase 4A               │           │    ███    │           │    ██░    │    ███    │           │
Phase 4B               │           │    ███    │    ██░    │    ░░░    │           │           │     ░
Phase 4C       ░░░     │    ███    │    ░░░    │           │           │           │           │
Phase 5        ░░░     │    ░░░    │    ░░░    │           │    ███    │    ██░    │    ██░    │    ░░░
            ───────────┴───────────┴───────────┴───────────┴───────────┴───────────┴───────────┴───────────

███ = primary coverage    ██░ = strong supporting    ░░░ = background    (blank) = minimal

5-Phase Curriculum

Phase 1: Digital Foundations (6–12 months)

The language of hardware — from gates and Verilog to CUDA kernels.

Topic Key Skills Layer
Digital Design and HDL Number systems, logic, memory; Verilog, testbenches, synthesis L6
Computer Architecture ISA, pipelines, caches, OoO, coherence; modern CPUs/GPUs/memory L5
Operating Systems Processes, threads, scheduling, memory management, drivers L3/L4
C++ and Parallel Computing C++ & SIMD · OpenMP & oneTBB · CUDA & SIMT · ROCm & HIP · OpenCL & SYCL L1/L3

Phase 2: Embedded Systems (6–12 months)

The boards and buses that sit next to inference — MCUs, RTOS, and embedded Linux.

Topic Key Skills Layer
Embedded Software ARM Cortex-M, FreeRTOS, SPI/UART/I2C/CAN, power, OTA L4
Embedded Linux Yocto, PetaLinux, kernel, rootfs L3/L4

Phase 3: Artificial Intelligence (6–12 months)

The workloads your hardware must run. Core + two tracks.  ·  Hub: Phase 3 — Artificial Intelligence

Core (mandatory):

# Topic Layer
1 Neural Networks — MLPs, CNNs, training, backpropagation L1
2 Deep Learning FrameworksmicrogradPyTorchtinygrad L1/L2

Track A — Hardware & Edge AI (→ Phase 4):

# Topic Layer
3 Computer Vision — detection, segmentation, 3D vision, OpenCV L1
4 Sensor Fusion — camera/LiDAR/IMU, Kalman, BEVFusion L1
5 Voice AI — STT (Whisper), TTS (Piper), VAD, keyword spotting L1
6 Edge AI & Model Optimization — quantization, pruning, deployment L1

Track B — Agentic AI & ML Engineering (→ Phase 5 HPC/GenAI):

# Topic Layer
3 Agentic AI & GenAI — LLM agents, RAG, tool use L1
4 ML Engineering & MLOps — training pipelines, model serving L1
5 LLM Application Development — fine-tuning, RAG architecture L1

Phase 4: Hardware Deployment & Compilation (6–12 months each)

Deploy AI on real silicon and learn how compilers bridge models to hardware.

Track A — Xilinx FPGA

Topic Key Skills Layer
Xilinx FPGA Development Vivado, IP, timing, ILA/VIO L6
Zynq UltraScale+ MPSoC PS/PL, Linux on Zynq L5/L6
Advanced FPGA Design CDC, floorplanning, power, PR L6
HLS C→RTL, dataflow, pipelining L5/L6
Runtime & Drivers XRT, DMA, kernel drivers, Vitis AI/FINN L3
Projects 1080p→4K wireless video (VCU, MIPI, TDMA) L3–L6

Track B — NVIDIA Jetson

Topic Key Skills Layer
Jetson Platform Orin Nano, JetPack 6.2.2, L4T, CUDA L3
Carrier Board Schematic, PCB, thermal, bring-up L5
L4T Customization Rootfs, kernel/DT, OTA L3/L4
FSP Firmware FreeRTOS on SPE/AON L4
App Development ML/AI, ROS 2, multimedia L1/L3
Security & OTA Secure boot, OP-TEE, A/B OTA L4
Manufacturing FCC/CE, DFM, production flash L8
Runtime & Drivers CUDA runtime, TensorRT, DLA, DeepStream L3

Track C — ML Compiler

Part Key Skills Layer
Compiler Fundamentals Graph IR, LLVM, MLIR, TVM/tinygrad, custom backends L2
DL Inference Optimization Triton, CUTLASS, Flash-Attention, TensorRT-LLM, quantization L1/L2

Phase 5: Specialization Tracks (ongoing)

Track Focus Guide
A: GPU Infrastructure Nvidia GPU (NCCL, NVLink) · AMD GPU (ROCm, HIP, MI300X) Guide →
B: HPC CUDA-X Libraries: 40+ GPU-accelerated libraries Guide →
C: Edge AI Efficient nets, quantization, Holoscan, real-time pipelines Guide →
D: Robotics Nav2, MoveIt, planning Guide →
E: Autonomous Vehicles openpilot, tinygrad, BEV, safety, Lauterbach Guide →
F: AI Chip Design Systolic arrays, dataflow, tinygrad↔hardware, ASIC path Guide →

Layer → Job Title Quick Reference

Layer Job titles Primary phases
L1 ML Inference Eng · Edge AI Eng · Agentic AI Eng · GenAI Eng · MLOps Eng Phase 3, 4B+C
L2 AI Compiler Eng · Graph Optimization Eng · Kernel Eng Phase 4C
L3 GPU Runtime Eng · Linux Kernel Eng · Embedded Linux BSP Eng Phase 4A§5, 4B§8
L4 Firmware Eng · Embedded SW Eng · Embedded Linux Eng · IoT Eng Phase 2, 4B§4
L5 AI Accelerator Architect · SoC Platform Eng Phase 1§2, 4A, 5F
L6 RTL Design Eng · FPGA Eng · DV Eng Phase 1§1, 4A
L7 Theory: OpenROAD, GDS flow Phase 5F
L8 Theory: chiplets, CoWoS, TinyTapeout Phase 5F

Cross-layer roles: - Autonomous Vehicles HW/SW Engineer — L1 through L4 (Phase 4B + Phase 5E) - AI Hardware Engineer (Full-Stack) — L1 through L6 (the signature role this roadmap targets)


Reference Projects

Project What it teaches Used in
tinygrad Minimal DL framework — IR, scheduler, BEAM, backends Phase 3, 4C, 5E
openpilot Production ADAS — camera→ISP→modeld→planning→CAN Phase 4B, 5E

Additional Resources


A community-driven educational roadmap for AI hardware engineering. · Star ⭐ if you find this useful