NVSHMEM Installation Guide

This NVIDIA NVSHMEM Installation Guide provides step-by-step instructions to download and install NVSHMEM 2.10.1

Overview

NVIDIA® NVSHMEM™ is a programming interface that implements a Partitioned Global Address Space (PGAS) model across a cluster of NVIDIA GPUs. NVSHMEM provides an easy-to-use interface to allocate memory that is symmetrically distributed across the GPUs. In addition to a CPU-side interface, NVSHMEM also provides a CUDA kernel-side interface that allows NVIDIA CUDA® threads to access any location in the symmetrically-distributed memory.

Hardware Requirements

NVSHMEM requires the following hardware:

  • The x86_64 or ppc64le CPU architectures.

  • NVIDIA Data Center GPU of the NVIDIA Volta™ GPU architecture or later.

    For a complete list, refer to https://developer.nvidia.com/cuda-gpus.

  • All GPUs must be P2P-connected via NVLink/PCIe or via GPUDirect RDMA. The following networks are supported: - InfiniBand/RoCE with a Mellanox adapter (CX-4 or later) - Slingshot-11 (Libfabric CXI provider) - Amazon EFA (Libfabric EFA provider)

    Support for atomics requires a NVLink connection or a GPUDirect RDMA connection and GDRCopy. Refer to Software Requirements for more information.

Software Requirements

NVSHMEM requires the following software:

  • 64-bit Linux.

    For a complete compatibility matrix, see the NVIDIA CUDA Installation Guide for Linux.

  • A C++ Compiler with C++11 support.

  • CUDA 10.2 or later.

  • GNU Make 3.81 or later.

  • (Optional) InfiniBand GPUDirect Async (IBGDA) transport

    • Requires Mellanox OFED >= 5.0

    • Requires nvidia.ko >= 510.40.3 loaded with PeerMappingOverride=1. This can be accomplished by modifying the options in /etc/modprobe.d/nvidia.conf as follows: options nvidia NVreg_RegistryDwords=”PeerMappingOverride=1;”

    • Requires nvidia_peermem >= 510.40.3 OR nv_peer_mem >= 1.3

      For more information, see: GPUDirect Async.

  • (Optional) Mellanox OFED.

    • This software is required to build the IBRC transport. If the OFED is unavailable, NVSHMEM can be built with NVSHMEM_IBRC_SUPPORT=0 set in the environment.

  • (Optional) nv_peer_mem for GPUDirect RDMA.

    • This software must use the IBRC and UCX transports and is required when NVSHMEM_IBRC_SUPPORT=0 and NVSHMEM_UCX_SUPPORT=0 are not set at compile time.

      Note

      Both the IBRC and UCX transports make use of GDRCopy in order to perform atomic operations. If the user is using either of these transports and intend on performing atomic operations, they MUST enable GDRCopy support. All other transports do not depend on GDRCopy and it is not needed in those cases.

    • A PMI-1 (for example, Hydra), PMI-2 (for example, Slurm), or a PMIx (for example, Open MPI) compatible launcher.

  • (Optional) GDRCopy v2.0 or later.

    • This software is required for atomics support on non-NVLink connections.

    • It is required when NVSHMEM_IBRC_SUPPORT=0 and NVSHMEM_UCX_SUPPORT=0 are not set at compile time.

  • (Optional) UCX version 1.10.0 or later.

    • This software is required to build the UCX transport.

    Note

    UCX must be configured with --enable-mt and --with-dm.

  • (Optional) libfabric 1.15.0.0 or later

  • (Optional) NCCL 2.0 or later.

  • (Optional) PMIx 3.1.5 or later.

  • (Optional) CMAKE 3.19 or later.

System Requirements

The CUDA MPS Service is optional. When using multiple processes per GPU, to support the complete NVHSMEM API, the CUDA MPS server must be configured on the system. To avoid deadlock situations, the total GPU utilization that is shared between the processes must be capped at 100% or lower.

Refer to Multi-Process Service for more information about how to configure the MPS server.