Multi-GPU
Multi-GPU
Multi-GPU
NVIDIA cuVS multi-GPU APIs use RAFT resources to coordinate work across GPUs. The resource object owns CUDA streams, memory resources, and communication state, so NVIDIA cuVS algorithms can be written against one interface and then run in different distributed environments.
The RAFT communicator is the part of that interface that handles rank metadata and collective communication. This lets an algorithm use the same communication pattern whether the surrounding application is launched with MPI, Dask, Ray, or another distributed runtime. The runtime is still responsible for starting workers, assigning ranks, and placing data shards; RAFT gives NVIDIA cuVS a common way to communicate once those pieces exist.
NCCL is the primary communicator backend used by NVIDIA cuVS multi-GPU algorithms. Most users interact with NCCL through one of two paths:
For multi-GPU vector indexes, see the Multi-GPU indexing guide.
C resources API | Python resources API
The examples below cover the high-level NVIDIA cuVS language surfaces that currently expose multi-GPU resource initialization: C, C++, and Python. Rust, Go, and Java do not currently expose matching high-level multi-GPU resource wrappers.
Use the single-node path when one process can see and control all GPUs used by the operation. This is the simplest setup for one machine with multiple GPUs. In C++ this is raft::device_resources_snmg; in C and Python it is exposed through NVIDIA cuVS multi-GPU resources wrappers.
When an application should restrict NVIDIA cuVS to a subset of visible GPUs, use the device-id-specific resource constructor for that language:
cuvsMultiGpuResourcesCreateWithDeviceIds()raft::device_resources_snmg(std::vector<int>{...})MultiGpuResources(device_ids=[...])Use the multi-node path when each process controls one rank, often one GPU, and the application runtime provides launch, rank assignment, and data placement. This API is currently exposed only in C++. The application creates an ncclComm_t, attaches it to a RAFT handle, and then passes that handle to NVIDIA cuVS APIs that accept raft::resources.
The example uses MPI only to launch ranks and broadcast the NCCL unique ID. A Ray, Dask, or service-based runtime can provide the same rank metadata and NCCL communicator setup through its own worker lifecycle.
The communicator makes distributed NVIDIA cuVS code less tied to one scheduler. NVIDIA cuVS algorithms call collectives through RAFT resources instead of embedding MPI, Dask, or Ray-specific logic in the algorithm itself. This is what allows the same algorithm implementation to be reused in different deployment systems.
In practice, the communicator provides:
NCCL is the communicator used for GPU collectives in NVIDIA cuVS. MPI, Ray, Dask, or another framework may still be used to launch workers, distribute data, and exchange the NCCL unique ID before the RAFT handle is initialized.
Use single-node resources when all GPUs are local to one process. Use an explicit NCCL communicator when the work is already distributed across ranks, nodes, or worker processes.