Phase 1: Digital Foundations¶
Learn how computation is represented, executed, scheduled, and accelerated before you try to deploy AI on real hardware.
Layer mapping: Primarily L5 (hardware architecture) and L6 (RTL / logic design), with an important bridge into L3 (runtime behavior) through operating systems and parallel computing.
Role targets: RTL Design Engineer · FPGA Engineer · GPU Runtime Engineer · AI Compiler Engineer · AI Accelerator Architect
Prerequisites: comfort with basic command-line tooling and a working development environment
What comes after: Phase 2 — Embedded Systems, Phase 3 — Artificial Intelligence, then one of the Phase 4 tracks: Xilinx FPGA, NVIDIA Jetson, or ML Compiler
Why This Phase Exists¶
Every later phase assumes you already understand the mechanics of computation:
- how logic turns into hardware behavior
- how processors execute instructions
- how operating systems manage memory and devices
- how parallel programs map work onto CPU and GPU hardware
If you skip this phase, later topics become tool usage instead of engineering.
Phase Structure¶
| # | Module | What you learn | Why it matters |
|---|---|---|---|
| 1 | Digital Design & HDL | Boolean logic, sequential systems, Verilog, testbenches | The language used to describe hardware |
| 2 | Computer Architecture | ISA, pipelines, caches, memory systems, throughput vs latency | The design logic behind CPUs, GPUs, and NPUs |
| 3 | Operating Systems | processes, memory, scheduling, synchronization, drivers | The software layer that manages hardware resources |
| 4 | C++ and Parallel Computing | SIMD, OpenMP, oneTBB, CUDA, HIP, SYCL | The execution models used by modern AI systems |
Recommended order: 1 → 2 → 3 → 4
If you already know digital logic, you can move faster through Module 1. If you already know OS fundamentals, still do Module 4 carefully; it is the most important bridge into AI hardware work.
What You Should Produce¶
This phase should leave you with visible low-level artifacts, not just notes.
- a small Verilog block plus a testbench
- an architecture explainer or comparison note for CPU vs GPU vs accelerator design
- a debugging write-up around memory, scheduling, or synchronization behavior
- at least one measured parallel program, ideally including a CUDA or GPU profiling artifact
Record those outputs in a simple engineering log, project README, or benchmark note so the work stays visible and reviewable.
Exit Criteria¶
You are ready to move on when you can:
- read basic RTL and explain what hardware it implies
- reason about cache, memory bandwidth, and pipeline bottlenecks
- explain how the OS affects device access and concurrency behavior
- profile a simple parallel workload and describe whether it is compute-bound, memory-bound, or synchronization-bound
That is the minimum base for the rest of the roadmap.
Who Should Prioritize This Phase¶
- Hardware-first learners: do the whole phase in order
- ML engineers moving downward: focus especially on Modules 2 and 4
- Embedded engineers: do Modules 2, 3, and 4 thoroughly even if Module 1 is familiar
Next¶
→ Phase 2 — Embedded Systems · Phase 3 — Artificial Intelligence