Skip to content

2. High Performance Computing (Phase 5)

Timeline: Ongoing — study each category as your projects demand it.

Prerequisites: Phase 1 §4 (C++/CUDA), Phase 4 Track B (Jetson, CUDA runtime), Phase 4 Track C (ML compiler + DL inference optimization), Phase 5A (GPU Infrastructure).


What this track covers

High Performance Computing goes beyond single-GPU programming and cluster setup (covered in Phase 5A GPU Infrastructure). This track covers the GPU-accelerated library ecosystem and domain-specific HPC applications that run on top of that infrastructure.

Sub-track Focus Guide
CUDA-X Libraries The full NVIDIA GPU-accelerated library suite: math (cuBLAS, cuFFT), DL (cuDNN, CUTLASS, TensorRT), data (RAPIDS), vision (DALI, CV-CUDA), communication (NCCL, NVSHMEM), parallel algorithms (Thrust, CUB), and 40+ more CUDA-X Libraries →

How to use this track

  1. Complete Phase 5A — GPU Infrastructure first — covers Nvidia/AMD GPU clusters, multi-GPU networking, Slurm/K8s, distributed training.
  2. Then CUDA-X Libraries — the comprehensive reference for every GPU-accelerated library. Organized by category (math, DL, data, vision, communication) with role-based learning paths and hands-on projects.

Relationship to other tracks

This track provides Other tracks use it for
cuBLAS, cuDNN, CUTLASS Phase 4C (compiler targets), Phase 5F (AI Chip Design — what hardware must accelerate)
NCCL, NVSHMEM Phase 5A (GPU Infrastructure — multi-GPU communication)
TensorRT, FlashInfer Phase 4B §8 (Jetson runtime), Phase 4C Part 2 (inference optimization)
RAPIDS (cuDF, cuML, cuGraph) Data pipeline acceleration for any ML workflow
CV-CUDA, DALI, Video Codec SDK Phase 5E (Autonomous Vehicles — perception pipelines)