Skip to content

Phase 3: Artificial Intelligence — The Workloads Your Hardware Must Run

Before you design hardware, you must deeply understand the software it accelerates.

Layer mapping: L1 (Application & Framework) — this entire phase teaches you what AI chips compute.

Role targets: ML Inference Engineer · Edge AI Engineer · Agentic AI Engineer · ML Engineer · AI Compiler Engineer

Prerequisites: Phase 1 (Digital Foundations), Phase 2 (Embedded Systems).

What comes after: Phase 4 Track A (FPGA), Track B (Jetson), Track C (ML Compiler).


Why This Phase Exists

Every decision in the 8-layer stack is driven by workload requirements. If you skip this phase, you'll design hardware without knowing what it needs to run. Phase 3 gives you the workload intuition that informs every hardware decision downstream.


Structure: Core + Two Tracks

Modules 1–2 are mandatory for everyone. Then you choose Track A, Track B, or both.

Module 1: Neural Networks          ← mandatory (what accelerators compute)
Module 2: Deep Learning Frameworks ← mandatory (micrograd, PyTorch, tinygrad)
        ↓                    ↓
   Track A                Track B
   Hardware &             Agentic AI &
   Edge AI                ML Engineering
        ↓                    ↓
   Phase 4               Phase 4 or
   (FPGA/Jetson/          Phase 5
    Compiler)             (HPC/GenAI)

Core Modules (Mandatory)

# Module What you learn Why it matters for hardware
1 Neural Networks MLPs, CNNs, training, backpropagation, loss functions What accelerators compute — tensors, matmul, activations
2 Deep Learning Frameworks microgradPyTorchtinygrad: autograd, ops, compiler pipeline How software generates workloads — the interface between models and hardware

Track A — Hardware & Edge AI

For engineers heading to Phase 4 (FPGA, Jetson, ML Compiler) and Phase 5 (Autonomous Vehicles, AI Chip Design).

This track teaches the perception and deployment workloads that drive edge inference hardware.

# Module What you learn Leads to
3 Computer Vision Image processing, detection, segmentation, 3D vision, OpenCV Phase 4A (FPGA vision), Phase 5E (AV perception)
4 Sensor Fusion Camera/LiDAR/IMU, Kalman filtering, BEVFusion, MOT Phase 4B (Jetson + ROS2), Phase 5E (AV)
5 Voice AI STT (Whisper), TTS (VITS/Piper), VAD, keyword spotting, noise suppression Phase 4A (FPGA audio DSP), Phase 4B (Jetson voice pipeline)
6 Edge AI & Model Optimization Quantization, pruning, knowledge distillation, deployment pipeline Phase 4 (bridge to all hardware tracks)

Build: OpenCV detection, sensor calibration, INT8 quantization, tinygrad on-device inference, Whisper on Jetson, edge voice pipeline (VAD→STT→TTS).


Track B — Agentic AI & ML Engineering

For engineers heading to Phase 5 (HPC, GPU Infrastructure) or building AI applications that generate the inference demand your hardware serves.

This track teaches the AI application, infrastructure, and safety workloads that sit on top of modern inference systems — the demand side of the chip market.

# Module What you learn Leads to
3 Agentic AI & GenAI LLM agents, RAG pipelines, tool use, multi-step reasoning, prompt-injection basics, coding agents, and GenAI products Phase 5A/B (GPU Infrastructure, HPC)
4 ML Engineering & MLOps Training pipelines, experiment tracking, model serving, observability, CI/CD for models Phase 5A/B (HPC, distributed training)
5 LLM Application Development Prompt engineering, Qwen3.5-4B Unsloth fine-tuning, RAG architecture, evaluation, guardrails, moderation, PII controls, production deployment L1d/L1e roles (highest job volume)

Build: RAG pipeline with vector search, agent with tool calling, coding agent in terminal or CI, fine-tune Qwen/Qwen3.5-4B-Base with Unsloth, deploy model behind Triton/vLLM, secure prompt gateway with moderation and redaction.


Relevant Structure: Safe LLM Systems in Track B

For Track B, safety is not a side note. It is part of the application architecture.

  • Module 3B teaches the threat model: prompt injection, tool misuse, document attacks, and unsafe retrieval context.
  • Module 4B teaches the operating model: policy gateways, logs, tracing, rollout controls, and production monitoring.
  • Module 5B teaches the enforcement model: content moderation, jailbreak detection, denied topics, PII redaction, output validation, and fallback behavior.

Use a defense-in-depth pipeline:

User input
  -> Unicode normalization / sanitization
  -> blocklists or denied-topic checks
  -> moderation and jailbreak / prompt-injection classifiers
  -> model with hardened system prompt and tool constraints
  -> output moderation, validation, and PII redaction
  -> logging, review, and threshold tuning

The exact policy depends on the product. Some systems block harmful or illegal requests. Others also block application-disallowed topics such as political persuasion, regulated advice, or competitor-sensitive content. The important point is to make policy enforcement explicit and measurable.


Why Two Tracks?

Track A (Hardware & Edge AI) Track B (Agentic AI & ML Eng)
Goal Understand workloads that run on your chip Understand workloads that create inference demand
Focus Perception, sensors, optimization, deployment LLMs, agents, training pipelines, serving
Hardware connection Direct — you deploy on FPGA/Jetson/NPU Indirect — you generate the traffic the chip serves
Job market L1a, L1b, L1c roles (~4,500/month) L1d, L1e roles (~15,000/month)
Remote % 10–15% (hardware access needed) 20–25% (cloud/API-based)
Phase 4 path Track A → B → C (all hardware) Track C (compiler) or Phase 5 directly

Do both? If you have time, Track A → Track B gives you full L1 coverage. Most hardware-focused engineers do Track A first, then add Track B topics as needed.


How This Phase Connects to the Stack

What you learn How it informs hardware design
Matrix multiply in neural networks L5: systolic array dimensions, dataflow strategy
Conv2D, attention, pooling ops L2: what the compiler must fuse and tile
Quantization (INT8, FP8) L6: precision support in PE design
LLM inference (KV-cache, batching) L5: memory hierarchy, HBM bandwidth requirements
Model computational graphs L2: graph IR representation, fusion opportunities
Training at scale (distributed) L3: NCCL, multi-GPU runtime
Moderation, guardrails, and prompt shielding L3/L4: serving topology, latency budget, sidecar services, and policy gateways

What You Should Produce

By the end of this phase, you should have at least:

  • one model-implementation or workload-intuition artifact from the core modules
  • one deployment- or optimization-oriented artifact from Track A or Track B
  • one short write-up connecting workload behavior to hardware constraints such as memory, latency, throughput, or precision

Examples include a quantization comparison, a profiling note, a tinygrad trace, a RAG latency analysis, or a sensor-pipeline benchmark.


Exit Criteria

You are ready for Phase 4 when you can:

  • explain what your target workloads actually compute
  • identify whether a workload is dominated by compute, memory, batching, context length, or deployment constraints
  • connect model structure to compiler, runtime, or hardware implications
  • show at least one artifact that proves you did more than read about the workload

Additional Resources


Next

Phase 4 Track A — Xilinx FPGA · Phase 4 Track B — Jetson · Phase 4 Track C — ML Compiler