Phase 3: Artificial Intelligence — The Workloads Your Hardware Must Run¶

Before you design hardware, you must deeply understand the software it accelerates.

Layer mapping: L1 (Application & Framework) — this entire phase teaches you what AI chips compute.

Prerequisites: Phase 1 (Digital Foundations), Phase 2 (Embedded Systems).

What comes after: Phase 4 Track A (FPGA), Track B (Jetson), Track C (ML Compiler).

Why This Phase Exists¶

Every decision in the 8-layer stack is driven by workload requirements. If you skip this phase, you'll design hardware without knowing what it needs to run. Phase 3 gives you the workload intuition that informs every hardware decision downstream.

Structure: Core + Two Tracks¶

Modules 1–2 are mandatory for everyone. Then you choose Track A, Track B, or both.

Module 1: Neural Networks          ← mandatory (what accelerators compute)
Module 2: Deep Learning Frameworks ← mandatory (micrograd, PyTorch, tinygrad)
        ↓                    ↓
   Track A                Track B
   Hardware &             Agentic AI &
   Edge AI                ML Engineering
        ↓                    ↓
   Phase 4               Phase 4 or
   (FPGA/Jetson/          Phase 5
    Compiler)             (HPC/GenAI)

Core Modules (Mandatory)¶

#	Module	What you learn	Why it matters for hardware
1	Neural Networks	MLPs, CNNs, training, backpropagation, loss functions	What accelerators compute — tensors, matmul, activations
2	Deep Learning Frameworks	micrograd → PyTorch → tinygrad: autograd, ops, compiler pipeline	How software generates workloads — the interface between models and hardware

Track A — Hardware & Edge AI¶

For engineers heading to Phase 4 (FPGA, Jetson, ML Compiler) and Phase 5 (Autonomous Vehicles, AI Chip Design).

This track teaches the perception and deployment workloads that drive edge inference hardware.

#	Module	What you learn	Leads to
3	Computer Vision	Image processing, detection, segmentation, 3D vision, OpenCV	Phase 4A (FPGA vision), Phase 5E (AV perception)
4	Sensor Fusion	Camera/LiDAR/IMU, Kalman filtering, BEVFusion, MOT	Phase 4B (Jetson + ROS2), Phase 5E (AV)
5	Voice AI	STT (Whisper), TTS (VITS/Piper), VAD, keyword spotting, noise suppression	Phase 4A (FPGA audio DSP), Phase 4B (Jetson voice pipeline)
6	Edge AI & Model Optimization	Quantization, pruning, knowledge distillation, deployment pipeline	Phase 4 (bridge to all hardware tracks)

Build: OpenCV detection, sensor calibration, INT8 quantization, tinygrad on-device inference, Whisper on Jetson, edge voice pipeline (VAD→STT→TTS).

Track B — Agentic AI & ML Engineering¶

For engineers heading to Phase 5 (HPC, GPU Infrastructure) or building AI applications that generate the inference demand your hardware serves.

This track teaches the AI application and infrastructure workloads — the demand side of the chip market.

#	Module	What you learn	Leads to
3	Agentic AI & GenAI	LLM agents, RAG pipelines, tool use, multi-step reasoning, GenAI products	Phase 5A/B (GPU Infrastructure, HPC)
4	ML Engineering & MLOps	Training pipelines, experiment tracking, model serving, CI/CD for models	Phase 5A/B (HPC, distributed training)
5	LLM Application Development	Prompt engineering, fine-tuning, RAG architecture, evaluation, production deployment	L1d/L1e roles (highest job volume)

Build: RAG pipeline with vector search, agent with tool calling, fine-tune a small LLM, deploy model behind Triton/vLLM.

Why Two Tracks?¶

	Track A (Hardware & Edge AI)	Track B (Agentic AI & ML Eng)
Goal	Understand workloads that run on your chip	Understand workloads that create inference demand
Focus	Perception, sensors, optimization, deployment	LLMs, agents, training pipelines, serving
Hardware connection	Direct — you deploy on FPGA/Jetson/NPU	Indirect — you generate the traffic the chip serves
Job market	L1a, L1b, L1c roles (~4,500/month)	L1d, L1e roles (~15,000/month)
Remote %	10–15% (hardware access needed)	20–25% (cloud/API-based)
Phase 4 path	Track A → B → C (all hardware)	Track C (compiler) or Phase 5 directly

Do both? If you have time, Track A → Track B gives you full L1 coverage. Most hardware-focused engineers do Track A first, then add Track B topics as needed.

How This Phase Connects to the Stack¶

What you learn	How it informs hardware design
Matrix multiply in neural networks	L5: systolic array dimensions, dataflow strategy
Conv2D, attention, pooling ops	L2: what the compiler must fuse and tile
Quantization (INT8, FP8)	L6: precision support in PE design
LLM inference (KV-cache, batching)	L5: memory hierarchy, HBM bandwidth requirements
Model computational graphs	L2: graph IR representation, fusion opportunities
Training at scale (distributed)	L3: NCCL, multi-GPU runtime

Additional Resources¶

CMU AI Courses Reference

Next¶

→ Phase 4 Track A — Xilinx FPGA · Phase 4 Track B — Jetson · Phase 4 Track C — ML Compiler