NCCL
2.7
Overview of NCCL
Using NCCL
Creating a Communicator
Using multiple NCCL communicators concurrently
Error handling and communicator destruction
Normal termination
Asynchronous errors and error handling
Collective Operations
AllReduce
Broadcast
Reduce
AllGather
ReduceScatter
Data Pointers
CUDA Stream Semantics
Group Calls
Management Of Multiple GPUs From One Thread
Aggregated Operations (2.2 and later)
Point-to-point communication
Sendrecv
One-to-all (scatter)
All-to-one (gather)
All-to-all
Neighbor exchange
Thread Safety
In-place Operations
NCCL API
Communicator Creation and Management Functions
ncclGetVersion
ncclGetUniqueId
ncclCommInitRank
ncclCommInitAll
ncclCommDestroy
ncclCommAbort
ncclCommGetAsyncError
ncclCommCount
ncclCommCuDevice
ncclCommUserRank
Collective Communication Functions
ncclAllReduce
ncclBroadcast
ncclReduce
ncclAllGather
ncclReduceScatter
Group Calls
ncclGroupStart
ncclGroupEnd
Point To Point Communication Functions
ncclSend
ncclRecv
Types
ncclComm_t
ncclResult_t
ncclDataType_t
ncclRedOp_t
Migrating from NCCL 1 to NCCL 2
Initialization
Communication
Counts
In-place usage for AllGather and ReduceScatter
AllGather arguments order
Datatypes
Error codes
Examples
Communicator Creation and Destruction Examples
Example 1: Single Process, Single Thread, Multiple Devices
Example 2: One Device per Process or Thread
Example 3: Multiple Devices per Thread
Communication Examples
Example 1: One Device per Process or Thread
Example 2: Multiple Devices per Thread
NCCL and MPI
API
Using multiple devices per process
ReduceScatter operation
Send and Receive counts
Other collectives and point-to-point operations
In-place operations
Using NCCL within an MPI Program
MPI Progress
Inter-GPU Communication with CUDA-aware MPI
Environment Variables
NCCL_P2P_DISABLE
Values accepted
NCCL_P2P_LEVEL
Values accepted
NCCL_SHM_DISABLE
Values accepted
NCCL_SOCKET_IFNAME
Values accepted
NCCL_SOCKET_NTHREADS
Values accepted
NCCL_NSOCKS_PERTHREAD
Values accepted
NCCL_DEBUG
Values accepted
NCCL_BUFFSIZE
Values accepted
NCCL_NTHREADS
Values accepted
NCCL_RINGS
Values accepted
NCCL_MAX_NCHANNELS
Values accepted
NCCL_MIN_NCHANNELS
Values accepted
NCCL_CHECKS_DISABLE
Values accepted
NCCL_CHECK_POINTERS
Values accepted
NCCL_LAUNCH_MODE
Values accepted
NCCL_IB_DISABLE
Values accepted
NCCL_IB_HCA
Values accepted
NCCL_IB_TIMEOUT
Values accepted
NCCL_IB_RETRY_CNT
Values accepted
NCCL_IB_GID_INDEX
Values accepted
NCCL_IB_SL
Values accepted
NCCL_IB_TC
Values accepted
NCCL_IB_AR_THRESHOLD
Values accepted
NCCL_IB_CUDA_SUPPORT
Values accepted
NCCL_NET_GDR_LEVEL (formerly NCCL_IB_GDR_LEVEL)
Values accepted
NCCL_NET_GDR_READ
Values accepted
NCCL_SINGLE_RING_THRESHOLD
Values accepted
NCCL_LL_THRESHOLD
Values accepted
NCCL_TREE_THRESHOLD
Values accepted
NCCL_ALGO
Values accepted
NCCL_PROTO
Values accepted
NCCL_IGNORE_CPU_AFFINITY
Values accepted
NCCL_DEBUG_FILE
Values accepted
NCCL_DEBUG_SUBSYS
Values accepted
NCCL_COLLNET_ENABLE
Value accepted
NCCL_TOPO_FILE
Value accepted
NCCL_TOPO_DUMP_FILE
Value accepted
Troubleshooting
Errors
GPU Direct
GPU-to-GPU communication
GPU-to-NIC communication
PCI Access Control Services (ACS)
Topology detection
Networking issues
IP Network Interfaces
InfiniBand
Known Issues
Sharing Data
Concurrency between NCCL and CUDA calls (NCCL up to 2.0.5 or CUDA 8)
NCCL
Docs
»
NCCL API
View page source
NCCL API
ΒΆ
The following sections describe the collective communications methods and operations.
Communicator Creation and Management Functions
ncclGetVersion
ncclGetUniqueId
ncclCommInitRank
ncclCommInitAll
ncclCommDestroy
ncclCommAbort
ncclCommGetAsyncError
ncclCommCount
ncclCommCuDevice
ncclCommUserRank
Collective Communication Functions
ncclAllReduce
ncclBroadcast
ncclReduce
ncclAllGather
ncclReduceScatter
Group Calls
ncclGroupStart
ncclGroupEnd
Point To Point Communication Functions
ncclSend
ncclRecv
Types
ncclComm_t
ncclResult_t
ncclDataType_t
ncclRedOp_t