Phase 1: Digital Foundations¶

Learn how computation is represented, executed, scheduled, and accelerated before you try to deploy AI on real hardware.

Layer mapping: Primarily L5 (hardware architecture) and L6 (RTL / logic design), with an important bridge into L3 (runtime behavior) through operating systems and parallel computing.

Role targets: RTL Design Engineer · FPGA Engineer · GPU Runtime Engineer · AI Compiler Engineer · AI Accelerator Architect

Prerequisites: comfort with basic command-line tooling and a working development environment

What comes after: Phase 2 — Embedded Systems, Phase 3 — Artificial Intelligence, then one of the Phase 4 tracks: Xilinx FPGA, NVIDIA Jetson, or ML Compiler

Why This Phase Exists¶

Every later phase assumes you already understand the mechanics of computation:

how logic turns into hardware behavior
how processors execute instructions
how operating systems manage memory and devices
how parallel programs map work onto CPU and GPU hardware

If you skip this phase, later topics become tool usage instead of engineering.

Phase Structure¶

#	Module	What you learn	Why it matters
1	Digital Design & HDL	Boolean logic, sequential systems, Verilog, testbenches	The language used to describe hardware
2	Computer Architecture	ISA, pipelines, caches, memory systems, throughput vs latency	The design logic behind CPUs, GPUs, and NPUs
3	Operating Systems	processes, memory, scheduling, synchronization, drivers	The software layer that manages hardware resources
4	C++ and Parallel Computing	SIMD, OpenMP, oneTBB, CUDA, HIP, SYCL	The execution models used by modern AI systems

Recommended order: 1 → 2 → 3 → 4

If you already know digital logic, you can move faster through Module 1. If you already know OS fundamentals, still do Module 4 carefully; it is the most important bridge into AI hardware work.

What You Should Produce¶

This phase should leave you with visible low-level artifacts, not just notes.

a small Verilog block plus a testbench
an architecture explainer or comparison note for CPU vs GPU vs accelerator design
a debugging write-up around memory, scheduling, or synchronization behavior
at least one measured parallel program, ideally including a CUDA or GPU profiling artifact

Record those outputs in a simple engineering log, project README, or benchmark note so the work stays visible and reviewable.

Exit Criteria¶

You are ready to move on when you can:

read basic RTL and explain what hardware it implies
reason about cache, memory bandwidth, and pipeline bottlenecks
explain how the OS affects device access and concurrency behavior
profile a simple parallel workload and describe whether it is compute-bound, memory-bound, or synchronization-bound

That is the minimum base for the rest of the roadmap.

Who Should Prioritize This Phase¶

Hardware-first learners: do the whole phase in order
ML engineers moving downward: focus especially on Modules 2 and 4
Embedded engineers: do Modules 2, 3, and 4 thoroughly even if Module 1 is familiar

Next¶

→ Phase 2 — Embedded Systems · Phase 3 — Artificial Intelligence