Phase 3: Artificial Intelligence — The Workloads Your Hardware Must Run¶

Before you design hardware, you must deeply understand the software it accelerates.

Layer mapping: L1 (Application & Framework) — this entire phase teaches you what AI chips compute.

Role targets: ML Inference Engineer · Edge AI Engineer · Agentic AI Engineer · ML Engineer · AI Compiler Engineer

Prerequisites: Phase 1 (Digital Foundations), Phase 2 (Embedded Systems).

What comes after: Phase 4 Track A (FPGA), Track B (Jetson), Track C (ML Compiler).

Why This Phase Exists¶

Every decision in the 8-layer stack is driven by workload requirements. If you skip this phase, you'll design hardware without knowing what it needs to run. Phase 3 gives you the workload intuition that informs every hardware decision downstream.

Structure: Core + Two Tracks¶

Modules 1–2 are mandatory for everyone. Then you choose Track A, Track B, or both.

Module 1: Neural Networks          ← mandatory (what accelerators compute)
Module 2: Deep Learning Frameworks ← mandatory (micrograd, PyTorch, tinygrad)
        ↓                    ↓
   Track A                Track B
   Hardware &             Agentic AI &
   Edge AI                ML Engineering
        ↓                    ↓
   Phase 4               Phase 4 or
   (FPGA/Jetson/          Phase 5
    Compiler)             (HPC/GenAI)

Core Modules (Mandatory)¶

#	Module	What you learn	Why it matters for hardware
1	Neural Networks	MLPs, CNNs, training, backpropagation, loss functions	What accelerators compute — tensors, matmul, activations
2	Deep Learning Frameworks	micrograd → PyTorch → tinygrad: autograd, ops, compiler pipeline	How software generates workloads — the interface between models and hardware

Track A — Hardware & Edge AI¶

For engineers heading to Phase 4 (FPGA, Jetson, ML Compiler) and Phase 5 (Autonomous Vehicles, AI Chip Design).

This track teaches the perception and deployment workloads that drive edge inference hardware.

#	Module	What you learn	Leads to
3	Computer Vision	Image processing, detection, segmentation, 3D vision, OpenCV	Phase 4A (FPGA vision), Phase 5E (AV perception)
4	Sensor Fusion	Camera/LiDAR/IMU, Kalman filtering, BEVFusion, MOT	Phase 4B (Jetson + ROS2), Phase 5E (AV)
5	Voice AI	STT (Whisper), TTS (VITS/Piper), VAD, keyword spotting, noise suppression	Phase 4A (FPGA audio DSP), Phase 4B (Jetson voice pipeline)
6	Edge AI & Model Optimization	Quantization, pruning, knowledge distillation, deployment pipeline	Phase 4 (bridge to all hardware tracks)

Build: OpenCV detection, sensor calibration, INT8 quantization, tinygrad on-device inference, Whisper on Jetson, edge voice pipeline (VAD→STT→TTS).

Track B — Agentic AI & ML Engineering¶

For engineers heading to Phase 5 (HPC, GPU Infrastructure) or building AI applications that generate the inference demand your hardware serves.

This track teaches the AI application, infrastructure, and safety workloads that sit on top of modern inference systems — the demand side of the chip market.

#	Module	What you learn	Leads to
3	Agentic AI & GenAI	LLM agents, RAG pipelines, tool use, multi-step reasoning, prompt-injection basics, coding agents, and GenAI products	Phase 5A/B (GPU Infrastructure, HPC)
4	ML Engineering & MLOps	Training pipelines, experiment tracking, model serving, observability, CI/CD for models	Phase 5A/B (HPC, distributed training)
5	LLM Application Development	Prompt engineering, Qwen3.5-4B Unsloth fine-tuning, RAG architecture, evaluation, guardrails, moderation, PII controls, production deployment	L1d/L1e roles (highest job volume)

Build: RAG pipeline with vector search, agent with tool calling, coding agent in terminal or CI, fine-tune Qwen/Qwen3.5-4B-Base with Unsloth, deploy model behind Triton/vLLM, secure prompt gateway with moderation and redaction.

Relevant Structure: Safe LLM Systems in Track B¶

For Track B, safety is not a side note. It is part of the application architecture.

Module 3B teaches the threat model: prompt injection, tool misuse, document attacks, and unsafe retrieval context.
Module 4B teaches the operating model: policy gateways, logs, tracing, rollout controls, and production monitoring.
Module 5B teaches the enforcement model: content moderation, jailbreak detection, denied topics, PII redaction, output validation, and fallback behavior.

Use a defense-in-depth pipeline:

User input
  -> Unicode normalization / sanitization
  -> blocklists or denied-topic checks
  -> moderation and jailbreak / prompt-injection classifiers
  -> model with hardened system prompt and tool constraints
  -> output moderation, validation, and PII redaction
  -> logging, review, and threshold tuning

The exact policy depends on the product. Some systems block harmful or illegal requests. Others also block application-disallowed topics such as political persuasion, regulated advice, or competitor-sensitive content. The important point is to make policy enforcement explicit and measurable.

Why Two Tracks?¶

	Track A (Hardware & Edge AI)	Track B (Agentic AI & ML Eng)
Goal	Understand workloads that run on your chip	Understand workloads that create inference demand
Focus	Perception, sensors, optimization, deployment	LLMs, agents, training pipelines, serving
Hardware connection	Direct — you deploy on FPGA/Jetson/NPU	Indirect — you generate the traffic the chip serves
Job market	L1a, L1b, L1c roles (~4,500/month)	L1d, L1e roles (~15,000/month)
Remote %	10–15% (hardware access needed)	20–25% (cloud/API-based)
Phase 4 path	Track A → B → C (all hardware)	Track C (compiler) or Phase 5 directly

Do both? If you have time, Track A → Track B gives you full L1 coverage. Most hardware-focused engineers do Track A first, then add Track B topics as needed.

How This Phase Connects to the Stack¶

What you learn	How it informs hardware design
Matrix multiply in neural networks	L5: systolic array dimensions, dataflow strategy
Conv2D, attention, pooling ops	L2: what the compiler must fuse and tile
Quantization (INT8, FP8)	L6: precision support in PE design
LLM inference (KV-cache, batching)	L5: memory hierarchy, HBM bandwidth requirements
Model computational graphs	L2: graph IR representation, fusion opportunities
Training at scale (distributed)	L3: NCCL, multi-GPU runtime
Moderation, guardrails, and prompt shielding	L3/L4: serving topology, latency budget, sidecar services, and policy gateways

What You Should Produce¶

By the end of this phase, you should have at least:

one model-implementation or workload-intuition artifact from the core modules
one deployment- or optimization-oriented artifact from Track A or Track B
one short write-up connecting workload behavior to hardware constraints such as memory, latency, throughput, or precision

Examples include a quantization comparison, a profiling note, a tinygrad trace, a RAG latency analysis, or a sensor-pipeline benchmark.

Exit Criteria¶

You are ready for Phase 4 when you can:

explain what your target workloads actually compute
identify whether a workload is dominated by compute, memory, batching, context length, or deployment constraints
connect model structure to compiler, runtime, or hardware implications
show at least one artifact that proves you did more than read about the workload

Additional Resources¶

CMU AI Courses Reference
Maxime Labonne's LLM Course — free LLM roadmap with Colab notebooks covering LLM fundamentals, fine-tuning, quantization, RAG, evaluation, and deployment. Use it as a hands-on companion for Track B.

Next¶

→ Phase 4 Track A — Xilinx FPGA · Phase 4 Track B — Jetson · Phase 4 Track C — ML Compiler