Using NCCLΒΆ
Using NCCL is similar to using any other library in your code:
- Install the NCCL library on your system
- Modify your application to link to that library
- Include the header file nccl.h in your application
- Create a communicator (see Creating a Communicator)
- Use NCCL collective communication primitives to perform data communication. You can familiarize yourself with the NCCL API documentation to maximize your usage performance.
Collective communication primitives are common patterns of data transfer among a group of CUDA devices. A communication algorithm involves many processors that are communicating together. Each CUDA device is identified within the communication group by a zero-based index or rank. Each rank uses a communicator object to refer to the collection of GPUs that are intended to work together. The creation of a communicator is the first step needed before launching any communication operation.
- Creating a Communicator
- Error handling and communicator abort
- Fault Tolerance
- Quality of Service
- Collective Operations
- Data Pointers
- CUDA Stream Semantics
- Group Calls
- Point-to-point communication
- Thread Safety
- In-place Operations
- Using NCCL with CUDA Graphs
- User Buffer Registration
- Device-Initiated Communication