Edge AI¶

Phase 3 — Artificial Intelligence (standalone topic). Learn where and why models run on-device; learn what they compute in Neural Networks.

Goal: Map the edge stack—latency, privacy, power tiers, and the train → optimize → deploy pipeline—so Phase 4 (Xilinx or Jetson) and specialization tracks have clear context.

Previous: Phase 1 §4 — C++ and Parallel Computing · Companion: Neural Networks (MLPs, CNNs, training, tinygrad) · Next (deployment depth): Phase 4 Track B — Jetson

1. Definition¶

Edge AI = running AI algorithms locally on a device (the "edge") instead of sending data to a remote cloud server.

Traditional Cloud AI:
  Device → [internet] → Cloud Server (GPU farm) → [internet] → Result
  Latency: 50–300ms   Privacy risk   Needs connectivity

Edge AI:
  Device → Local Chip (CPU/GPU/NPU/FPGA) → Result
  Latency: <1ms       Data stays local   Works offline

2. Why edge AI exists¶

Problem with Cloud AI	Edge AI Solution
Network latency (~100ms)	Sub-millisecond local inference
Bandwidth cost (video data)	Only send results, not raw data
Privacy (face/voice/medical)	Data never leaves the device
Reliability (no internet)	Works fully offline
Cloud cost at scale	One-time hardware cost

3. Where edge AI runs (tiers)¶

Tier 1 — Microcontrollers (MCU):
  STM32, Arduino, RP2040
  RAM: 256KB–512KB
  Power: <1W
  Use: keyword spotting, gesture detection

Tier 2 — Embedded Linux SBCs:
  Raspberry Pi, BeagleBone
  RAM: 1–8GB
  Power: 2–10W
  Use: image classification, object detection

Tier 3 — AI Accelerator SoCs:
  Nvidia Jetson, Google Coral (TPU), Apple Neural Engine
  RAM: 4–64GB
  Power: 5–30W
  Use: real-time video inference, NLP, robotics

Tier 4 — Edge Servers:
  FPGA + GPU combinations, industrial PCs
  Power: 50–300W
  Use: factory automation, autonomous vehicles

4. The edge AI pipeline¶

1. Train model on a powerful workstation/cloud (large data, many epochs)
2. Optimize model for edge (quantization, pruning, distillation)
3. Convert model to edge runtime format (ONNX, TensorRT, TFLite)
4. Deploy to edge device
5. Run inference locally in real-time

Step 1 is grounded in Neural Networks. Steps 2–5 are expanded in Phase 4 Track B (ML and AI) and, for custom silicon, Phase 4 Track A and Phase 5 — AI Chip Design.

5. How this fits the roadmap¶

You want…	Start here	Then
Intuition for tensors, backprop, CNNs	Neural Networks	tinygrad hands-on, micrograd
Product and deployment context (this guide)	Skim tiers + pipeline above	Phase 4 Jetson or FPGA track
Vision preprocessing and classical CV	Computer Vision	Phase 4 perception pipelines
Multi-sensor calibration, tracking, BEV fusion	Sensor Fusion	Phase 4 Jetson + ROS2 for integration

Hub: Phase 3 — Artificial Intelligence