2. High Performance Computing (Phase 5)¶
Timeline: Ongoing — study each category as your projects demand it.
Prerequisites: Phase 1 §4 (C++/CUDA), Phase 4 Track B (Jetson, CUDA runtime), Phase 4 Track C (ML compiler + DL inference optimization), Phase 5A (GPU Infrastructure).
What this track covers¶
High Performance Computing goes beyond single-GPU programming and cluster setup (covered in Phase 5A GPU Infrastructure). This track covers the GPU-accelerated library ecosystem and domain-specific HPC applications that run on top of that infrastructure.
| Sub-track | Focus | Guide |
|---|---|---|
| CUDA-X Libraries | The full NVIDIA GPU-accelerated library suite: math (cuBLAS, cuFFT), DL (cuDNN, CUTLASS, TensorRT), data (RAPIDS), vision (DALI, CV-CUDA), communication (NCCL, NVSHMEM), parallel algorithms (Thrust, CUB), and 40+ more | CUDA-X Libraries → |
How to use this track¶
- Complete Phase 5A — GPU Infrastructure first — covers Nvidia/AMD GPU clusters, multi-GPU networking, Slurm/K8s, distributed training.
- Then CUDA-X Libraries — the comprehensive reference for every GPU-accelerated library. Organized by category (math, DL, data, vision, communication) with role-based learning paths and hands-on projects.
Relationship to other tracks¶
| This track provides | Other tracks use it for |
|---|---|
| cuBLAS, cuDNN, CUTLASS | Phase 4C (compiler targets), Phase 5F (AI Chip Design — what hardware must accelerate) |
| NCCL, NVSHMEM | Phase 5A (GPU Infrastructure — multi-GPU communication) |
| TensorRT, FlashInfer | Phase 4B §8 (Jetson runtime), Phase 4C Part 2 (inference optimization) |
| RAPIDS (cuDF, cuML, cuGraph) | Data pipeline acceleration for any ML workflow |
| CV-CUDA, DALI, Video Codec SDK | Phase 5E (Autonomous Vehicles — perception pipelines) |