2. High Performance Computing (Phase 5)¶

Timeline: Ongoing — study each category as your projects demand it.

Prerequisites: Phase 1 §4 (C++/CUDA), Phase 4 Track B (Jetson, CUDA runtime), Phase 4 Track C (ML compiler + DL inference optimization), Phase 5A (GPU Infrastructure).

What this track covers¶

High Performance Computing goes beyond single-GPU programming and cluster setup (covered in Phase 5A GPU Infrastructure). This track covers the GPU-accelerated library ecosystem and domain-specific HPC applications that run on top of that infrastructure.

Sub-track	Focus	Guide
CUDA-X Libraries	The full NVIDIA GPU-accelerated library suite: math (cuBLAS, cuFFT), DL (cuDNN, CUTLASS, TensorRT), data (RAPIDS), vision (DALI, CV-CUDA), communication (NCCL, NVSHMEM), parallel algorithms (Thrust, CUB), and 40+ more	CUDA-X Libraries →

How to use this track¶

Complete Phase 5A — GPU Infrastructure first — covers Nvidia/AMD GPU clusters, multi-GPU networking, Slurm/K8s, distributed training.
Then CUDA-X Libraries — the comprehensive reference for every GPU-accelerated library. Organized by category (math, DL, data, vision, communication) with role-based learning paths and hands-on projects.

Relationship to other tracks¶

This track provides	Other tracks use it for
cuBLAS, cuDNN, CUTLASS	Phase 4C (compiler targets), Phase 5F (AI Chip Design — what hardware must accelerate)
NCCL, NVSHMEM	Phase 5A (GPU Infrastructure — multi-GPU communication)
TensorRT, FlashInfer	Phase 4B §8 (Jetson runtime), Phase 4C Part 2 (inference optimization)
RAPIDS (cuDF, cuML, cuGraph)	Data pipeline acceleration for any ML workflow
CV-CUDA, DALI, Video Codec SDK	Phase 5E (Autonomous Vehicles — perception pipelines)