1. GPU Infrastructure (Phase 5)¶

Timeline: 12–24 months (fundamentals); 24–48 months for advanced phase.

Prerequisites: Phase 4 Track B (Jetson, CUDA stack), Phase 4 Track C (ML compiler + DL inference optimization).

What this track covers¶

HPC = high-performance computing: solving problems that need massive compute and memory by using many machines (or many GPUs) working together. In AI, "HPC" usually means large-scale training and high-throughput inference on GPU clusters, not single workstations.

This track is organized by GPU vendor:

Sub-track	Focus	Guide
Nvidia GPU	CUDA, NCCL, NVLink/NVSwitch, InfiniBand, GPUDirect, Slurm/K8s, multi-GPU/multi-node clusters	Nvidia GPU →
AMD GPU	ROCm, HIP, RCCL, AMD Instinct (MI300X), RDNA/CDNA architecture, porting CUDA workloads	AMD GPU →

For the full CUDA-X library ecosystem (cuBLAS, cuDNN, CUTLASS, TensorRT, NCCL, RAPIDS, and 40+ more), see Phase 5B — High Performance Computing.

How to use this track¶

Start with Nvidia GPU — the dominant ecosystem for AI HPC. Covers fundamentals, virtualization, interconnects, storage, distributed training, and performance modeling.
Add AMD GPU — for portability, alternative hardware, and understanding the growing AMD AI ecosystem (MI300X, ROCm 6+).

Both sub-tracks assume you've completed Phase 4 Track C (compiler + inference optimization). The HPC content here focuses on infrastructure, multi-GPU scaling, and distributed systems — the compiler and kernel optimization skills come from Track C.