Phase 3: Artificial Intelligence — The Workloads Your Hardware Must Run¶
Before you design hardware, you must deeply understand the software it accelerates.
Layer mapping: L1 (Application & Framework) — this entire phase teaches you what AI chips compute.
Prerequisites: Phase 1 (Digital Foundations), Phase 2 (Embedded Systems).
What comes after: Phase 4 Track A (FPGA), Track B (Jetson), Track C (ML Compiler).
Why This Phase Exists¶
Every decision in the 8-layer stack is driven by workload requirements. If you skip this phase, you'll design hardware without knowing what it needs to run. Phase 3 gives you the workload intuition that informs every hardware decision downstream.
Structure: Core + Two Tracks¶
Modules 1–2 are mandatory for everyone. Then you choose Track A, Track B, or both.
Module 1: Neural Networks ← mandatory (what accelerators compute)
Module 2: Deep Learning Frameworks ← mandatory (micrograd, PyTorch, tinygrad)
↓ ↓
Track A Track B
Hardware & Agentic AI &
Edge AI ML Engineering
↓ ↓
Phase 4 Phase 4 or
(FPGA/Jetson/ Phase 5
Compiler) (HPC/GenAI)
Core Modules (Mandatory)¶
| # | Module | What you learn | Why it matters for hardware |
|---|---|---|---|
| 1 | Neural Networks | MLPs, CNNs, training, backpropagation, loss functions | What accelerators compute — tensors, matmul, activations |
| 2 | Deep Learning Frameworks | micrograd → PyTorch → tinygrad: autograd, ops, compiler pipeline | How software generates workloads — the interface between models and hardware |
Track A — Hardware & Edge AI¶
For engineers heading to Phase 4 (FPGA, Jetson, ML Compiler) and Phase 5 (Autonomous Vehicles, AI Chip Design).
This track teaches the perception and deployment workloads that drive edge inference hardware.
| # | Module | What you learn | Leads to |
|---|---|---|---|
| 3 | Computer Vision | Image processing, detection, segmentation, 3D vision, OpenCV | Phase 4A (FPGA vision), Phase 5E (AV perception) |
| 4 | Sensor Fusion | Camera/LiDAR/IMU, Kalman filtering, BEVFusion, MOT | Phase 4B (Jetson + ROS2), Phase 5E (AV) |
| 5 | Voice AI | STT (Whisper), TTS (VITS/Piper), VAD, keyword spotting, noise suppression | Phase 4A (FPGA audio DSP), Phase 4B (Jetson voice pipeline) |
| 6 | Edge AI & Model Optimization | Quantization, pruning, knowledge distillation, deployment pipeline | Phase 4 (bridge to all hardware tracks) |
Build: OpenCV detection, sensor calibration, INT8 quantization, tinygrad on-device inference, Whisper on Jetson, edge voice pipeline (VAD→STT→TTS).
Track B — Agentic AI & ML Engineering¶
For engineers heading to Phase 5 (HPC, GPU Infrastructure) or building AI applications that generate the inference demand your hardware serves.
This track teaches the AI application and infrastructure workloads — the demand side of the chip market.
| # | Module | What you learn | Leads to |
|---|---|---|---|
| 3 | Agentic AI & GenAI | LLM agents, RAG pipelines, tool use, multi-step reasoning, GenAI products | Phase 5A/B (GPU Infrastructure, HPC) |
| 4 | ML Engineering & MLOps | Training pipelines, experiment tracking, model serving, CI/CD for models | Phase 5A/B (HPC, distributed training) |
| 5 | LLM Application Development | Prompt engineering, fine-tuning, RAG architecture, evaluation, production deployment | L1d/L1e roles (highest job volume) |
Build: RAG pipeline with vector search, agent with tool calling, fine-tune a small LLM, deploy model behind Triton/vLLM.
Why Two Tracks?¶
| Track A (Hardware & Edge AI) | Track B (Agentic AI & ML Eng) | |
|---|---|---|
| Goal | Understand workloads that run on your chip | Understand workloads that create inference demand |
| Focus | Perception, sensors, optimization, deployment | LLMs, agents, training pipelines, serving |
| Hardware connection | Direct — you deploy on FPGA/Jetson/NPU | Indirect — you generate the traffic the chip serves |
| Job market | L1a, L1b, L1c roles (~4,500/month) | L1d, L1e roles (~15,000/month) |
| Remote % | 10–15% (hardware access needed) | 20–25% (cloud/API-based) |
| Phase 4 path | Track A → B → C (all hardware) | Track C (compiler) or Phase 5 directly |
Do both? If you have time, Track A → Track B gives you full L1 coverage. Most hardware-focused engineers do Track A first, then add Track B topics as needed.
How This Phase Connects to the Stack¶
| What you learn | How it informs hardware design |
|---|---|
| Matrix multiply in neural networks | L5: systolic array dimensions, dataflow strategy |
| Conv2D, attention, pooling ops | L2: what the compiler must fuse and tile |
| Quantization (INT8, FP8) | L6: precision support in PE design |
| LLM inference (KV-cache, batching) | L5: memory hierarchy, HBM bandwidth requirements |
| Model computational graphs | L2: graph IR representation, fusion opportunities |
| Training at scale (distributed) | L3: NCCL, multi-GPU runtime |
Additional Resources¶
Next¶
→ Phase 4 Track A — Xilinx FPGA · Phase 4 Track B — Jetson · Phase 4 Track C — ML Compiler