03 — Software Stack & Installation¶

1. GDS Software Architecture¶

┌────────────────────────────────────────────────────────┐
│  Application (PyTorch DataLoader, custom training loop) │
├────────────────────────────────────────────────────────┤
│  libcufile (GDS user-space library)                    │
│  cuFile API: cuFileRead(), cuFileWrite(), cuFileBufReg()│
├────────────────────────────────────────────────────────┤
│  cuFile Daemon (cufiledaemon)                          │
│  Manages file registration, policy, async coordination │
├────────────────────────────────────────────────────────┤
│  nvidia-fs (kernel module)                             │
│  Programs DMA engines, maps GPU BAR, handles RDMA      │
├───────────────────────┬────────────────────────────────┤
│  NVIDIA Driver        │  Mellanox OFED (RDMA for NVMe-oF)│
├───────────────────────┴────────────────────────────────┤
│  Linux Kernel (5.4+)                                   │
└────────────────────────────────────────────────────────┘

2. Installation¶

Step 1: Prerequisites¶

# Verify kernel version (5.4+ required, 5.14+ recommended per WD brief)
uname -r
# Expected: 5.14.0-70.70.1.el9_0.x86_64 (RHEL 9) or similar

# Verify GPU driver is installed
nvidia-smi
# Expected: Driver 535.86.10 or later

# Verify CUDA is installed
nvcc --version
# Expected: CUDA 12.2.1 or later

# Install kernel headers (needed to build nvidia-fs module)
# RHEL/CentOS:
sudo dnf install kernel-devel kernel-headers

# Ubuntu:
sudo apt install linux-headers-$(uname -r)

Step 2: Install Mellanox OFED (for NVMe-oF / RDMA)¶

# Download MLNX_OFED from NVIDIA networking portal
# Match version to reference: 5.8-3.0.7.0

wget https://linux.mellanox.com/public/repo/mlnx_ofed/5.8-3.0.7.0/rhel9.0/x86_64/MLNX_OFED_LINUX-5.8-3.0.7.0-rhel9.0-x86_64.tgz
tar xzf MLNX_OFED_LINUX-5.8-3.0.7.0-rhel9.0-x86_64.tgz
cd MLNX_OFED_LINUX-5.8-3.0.7.0-rhel9.0-x86_64

# Install with GDS-specific flags
sudo ./mlnxofedinstall \
    --with-nvmf-host-target \   # NVMe-oF host support
    --with-nfsrdma \            # RDMA for NFS (optional)
    --without-fw-update \       # skip firmware update (do separately)
    --add-kernel-support        # build kernel modules

sudo dracut -f   # update initramfs
sudo reboot

Step 3: Install GDS Package¶

# GDS is bundled with CUDA Toolkit extras and standalone package

# Method A: CUDA Toolkit installer (includes GDS)
# When running CUDA installer, select "GPUDirect Storage" component

# Method B: Standalone GDS install (RHEL/Rocky)
sudo dnf install cuda-cudart-12-2          # CUDA runtime
sudo dnf install libcufile-12-2            # GDS library
sudo dnf install libcufile-devel-12-2      # GDS headers
sudo dnf install nvidia-gds-12-2          # GDS metapackage (includes nvidia-fs)

# Verify nvidia-fs module is installed
sudo modprobe nvidia_fs
lsmod | grep nvidia_fs
# nvidia_fs   1234567  0

Step 4: Configure File System¶

GDS requires specific file system configuration:

# Mount NVMe with GDS-compatible options
# ext4 — no special mount options needed
# xfs  — use: dax=never (avoid DAX mode conflicts)

# Check current mounts
mount | grep nvme

# Example /etc/fstab entry for NVMe with GDS
/dev/nvme0n1p1  /mnt/nvme0  ext4  defaults,noatime  0 2

# Re-mount without atime for performance
sudo mount -o remount,noatime /mnt/nvme0

# For NVMe-oF targets, connect via nvme-cli
sudo nvme connect \
    --transport rdma \
    --traddr 192.168.100.10 \      # OpenFlex IP
    --trsvcid 4420 \               # NVMe-oF port
    --nqn nqn.2020-08.org.nvmexpress:subsys1

# Verify connected NVMe-oF namespaces
nvme list
# /dev/nvme4n1   WD OpenFlex ...

Step 5: Start cuFile Daemon¶

# Start daemon (required for GDS operations)
sudo systemctl enable cufile.service
sudo systemctl start cufile.service
sudo systemctl status cufile.service

# Manual start (for testing)
sudo /usr/local/cuda/bin/cufile_daemon &

# Verify daemon is running
ps aux | grep cufile

3. GDS Configuration File¶

The cuFile daemon is configured via /etc/cufile.json:

{
  "logging": {
    "level": "WARN"
  },
  "profile": {
    "nvtx": false,
    "cufile_stats": 0
  },
  "execution": {
    "max_direct_io_size_kb": 16384,
    "max_device_cache_size_kb": 131072,
    "max_io_queue_depth": 128,
    "max_batch_io_timeout_msecs": 5,
    "num_threads_per_block": 32,
    "num_io_threads": 4
  },
  "properties": {
    "max_direct_io_size_kb": 16384,
    "use_poll_mode": false,
    "poll_mode_max_size_kb": 4,
    "max_batch_io_size_kb": 65536,
    "allow_compat_mode": true,      ← fallback to CPU path if GDS unavailable
    "gds_rdma_write_support": true
  },
  "fs": {
    "generic": {
      "posix_unaligned_writes": false,
      "posix_writes": false,
      "allow_sb_offload": false
    }
  }
}

Key parameters:

Parameter	Recommended Value	Effect
`max_direct_io_size_kb`	16384 (16 MB)	Max single GDS transfer size
`max_device_cache_size_kb`	131072 (128 MB)	GPU-side cuFile cache
`max_io_queue_depth`	128	Concurrent I/O requests
`allow_compat_mode`	true	Silent fallback to CPU path
`use_poll_mode`	false	Interrupt vs polling (polling: lower latency, more CPU)

4. Verifying GDS is Active¶

gdscheck — The Essential Verification Tool¶

# Run GDS compatibility check
/usr/local/cuda/gds/tools/gdscheck -p

# Expected output for working GDS:
GDS release version: 2.17.3
nvidia_fs version:   2.17.3
cuFile version:      2.17.3
Platform:            Linux
GDS-Supported:       Yes  ← must be Yes

  ============
  ENVIRONMENT:
  ============
  DriverVersion          = 535.86.10
  IsSupported            = 1  ← 1 = supported
  IsRDMASupported        = 1  ← 1 = RDMA path available
  IsDirectStorageEnabled = 1  ← 1 = GDS kernel mode active
  IsCompatModeEnabled    = 0  ← 0 = NOT falling back to CPU path

# CRITICAL: IsDirectStorageEnabled must be 1
# If 0: check nvidia-fs module, check file system support, check driver version

Test with gds_bandwidth Tool¶

# Built-in GDS bandwidth benchmark
/usr/local/cuda/gds/tools/gds_bandwidth

# Test specific paths
/usr/local/cuda/gds/tools/gds_bandwidth \
    --file=/mnt/nvme0/test_gds.bin \   # local NVMe path
    --size=1GB \
    --gpu_id=0 \
    --pattern=sequential

# Expected output:
# GDS Read:  6.8 GB/s   (local NVMe, PCIe Gen4 x4)
# GDS Write: 3.2 GB/s
# Compat Read:  2.1 GB/s  (CPU path comparison)
# → GDS is 3.2× faster than CPU path for reads

Python Verification¶

import cupy as cp
import cufile  # pip install cufile

# Verify GDS is active on GPU 0
dev = cp.cuda.Device(0)
with dev:
    print(f"GDS enabled: {cufile.is_gds_available()}")
    # Expected: GDS enabled: True

5. Troubleshooting Common Installation Issues¶

Issue: `lsmod | grep nvidia_fs` shows nothing¶

# nvidia-fs not loaded — try manual load
sudo modprobe nvidia_fs

# If modprobe fails: module not built for your kernel
sudo /usr/src/nvidia-*/nvidia-fs/build nvidia-fs

# Or reinstall GDS package
sudo dnf reinstall nvidia-gds-12-2
sudo modprobe nvidia_fs

Issue: `gdscheck` shows `IsDirectStorageEnabled = 0`¶

# Most common cause: file system is not GDS-compatible
# Check if your NVMe is mounted as ext4 or xfs (NOT tmpfs, NFS, ZFS)
mount | grep nvme
# Expected: /dev/nvme0n1 on /mnt/data type ext4 ...

# ZFS is NOT supported by GDS kernel mode
# Use ext4 or xfs on the NVMe drives used for training data

# Also check: nvidia-fs and CUDA driver version mismatch
cat /proc/driver/nvidia-fs/params  # shows nvidia-fs version
nvidia-smi | grep "Driver Version"  # must match GDS expected driver version

Issue: Performance matches CPU path (GDS not actually running)¶

# allow_compat_mode=true in cufile.json causes silent fallback
# Disable compat mode to force GDS (will error if GDS unavailable):
# In /etc/cufile.json:
# "allow_compat_mode": false

# Monitor which path is being used:
export CUFILE_LOGFILE=/tmp/cufile.log
export CUFILE_LOG_LEVEL=INFO
# Re-run your workload, check log:
grep "GDS_COMPAT\|DIRECT_RDMA\|CUFILE_MODE" /tmp/cufile.log
# DIRECT_RDMA = GDS active ✓
# GDS_COMPAT = CPU fallback active ✗

Issue: OFED version mismatch with GDS¶

# Check OFED version
ofed_info | head -1
# Must match GDS requirements (5.8-3.0.7.0 for GDS 2.17.3)

# Full compatibility matrix:
# https://docs.nvidia.com/gpudirect-storage/release-notes/index.html

6. Software Version Compatibility Matrix (Reference)¶

GDS Version	CUDA	Driver	OFED	Kernel
2.17.3	12.2	535.86.10	5.8-3.0.7.0	≥ 5.14
1.9.1	12.0	525.x	5.8-x	≥ 5.4
1.7.x	11.8	520.x	5.6-x	≥ 5.4

Always consult the GDS Release Notes before upgrading any component.

03 — Software Stack & Installation¶

1. GDS Software Architecture¶

2. Installation¶

Step 1: Prerequisites¶

Step 2: Install Mellanox OFED (for NVMe-oF / RDMA)¶

Step 3: Install GDS Package¶

Step 4: Configure File System¶

Step 5: Start cuFile Daemon¶

3. GDS Configuration File¶

4. Verifying GDS is Active¶

gdscheck — The Essential Verification Tool¶

Test with gds_bandwidth Tool¶

Python Verification¶

5. Troubleshooting Common Installation Issues¶

Issue: lsmod | grep nvidia_fs shows nothing¶

Issue: gdscheck shows IsDirectStorageEnabled = 0¶

Issue: Performance matches CPU path (GDS not actually running)¶

Issue: OFED version mismatch with GDS¶

6. Software Version Compatibility Matrix (Reference)¶

References¶

Issue: `lsmod | grep nvidia_fs` shows nothing¶

Issue: `gdscheck` shows `IsDirectStorageEnabled = 0`¶