NCCL
2.17
  • Overview of NCCL
  • Using NCCL
    • Creating a Communicator
      • Creating a communication with options
      • Using multiple NCCL communicators concurrently
      • Finalizing a communicator
      • Destroying a communicator
    • Error handling and communicator abort
      • Asynchronous errors and error handling
    • Fault Tolerance
    • Collective Operations
      • AllReduce
      • Broadcast
      • Reduce
      • AllGather
      • ReduceScatter
    • Data Pointers
    • CUDA Stream Semantics
      • Mixing Multiple Streams within the same ncclGroupStart/End() group
    • Group Calls
      • Management Of Multiple GPUs From One Thread
      • Aggregated Operations (2.2 and later)
      • Nonblocking Group Operation
    • Point-to-point communication
      • Sendrecv
      • One-to-all (scatter)
      • All-to-one (gather)
      • All-to-all
      • Neighbor exchange
    • Thread Safety
    • In-place Operations
    • Using NCCL with CUDA Graphs
  • NCCL API
    • Communicator Creation and Management Functions
      • ncclGetLastError
      • ncclGetVersion
      • ncclGetUniqueId
      • ncclCommInitRank
      • ncclCommInitAll
      • ncclCommInitRankConfig
      • ncclCommFinalize
      • ncclCommDestroy
      • ncclCommAbort
      • ncclCommGetAsyncError
      • ncclCommCount
      • ncclCommCuDevice
      • ncclCommUserRank
    • Collective Communication Functions
      • ncclAllReduce
      • ncclBroadcast
      • ncclReduce
      • ncclAllGather
      • ncclReduceScatter
    • Group Calls
      • ncclGroupStart
      • ncclGroupEnd
    • Point To Point Communication Functions
      • ncclSend
      • ncclRecv
    • Types
      • ncclComm_t
      • ncclResult_t
      • ncclDataType_t
      • ncclRedOp_t
      • ncclScalarResidence_t
      • ncclConfig_t
    • User Defined Reduction Operators
      • ncclRedOpCreatePreMulSum
      • ncclRedOpDestroy
  • Migrating from NCCL 1 to NCCL 2
    • Initialization
    • Communication
    • Counts
    • In-place usage for AllGather and ReduceScatter
    • AllGather arguments order
    • Datatypes
    • Error codes
  • Examples
    • Communicator Creation and Destruction Examples
      • Example 1: Single Process, Single Thread, Multiple Devices
      • Example 2: One Device per Process or Thread
      • Example 3: Multiple Devices per Thread
    • Communication Examples
      • Example 1: One Device per Process or Thread
      • Example 2: Multiple Devices per Thread
  • NCCL and MPI
    • API
      • Using multiple devices per process
      • ReduceScatter operation
      • Send and Receive counts
      • Other collectives and point-to-point operations
      • In-place operations
    • Using NCCL within an MPI Program
      • MPI Progress
      • Inter-GPU Communication with CUDA-aware MPI
  • Environment Variables
    • NCCL_P2P_DISABLE
      • Values accepted
    • NCCL_P2P_LEVEL
      • Values accepted
      • Integer Values (Legacy)
    • NCCL_P2P_DIRECT_DISABLE
      • Values accepted
    • NCCL_SHM_DISABLE
      • Values accepted
    • NCCL_SOCKET_IFNAME
      • Values accepted
    • NCCL_SOCKET_NTHREADS
      • Values accepted
    • NCCL_NSOCKS_PERTHREAD
      • Values accepted
    • NCCL_DEBUG
      • Values accepted
    • NCCL_BUFFSIZE
      • Values accepted
    • NCCL_NTHREADS
      • Values accepted
    • NCCL_MAX_NCHANNELS
      • Values accepted
    • NCCL_MIN_NCHANNELS
      • Values accepted
    • NCCL_CROSS_NIC
      • Values accepted
    • NCCL_CHECKS_DISABLE
      • Values accepted
    • NCCL_CHECK_POINTERS
      • Values accepted
    • NCCL_LAUNCH_MODE
      • Values accepted
    • NCCL_IB_DISABLE
      • Values accepted
    • NCCL_IB_HCA
      • Values accepted
    • NCCL_IB_TIMEOUT
      • Values accepted
    • NCCL_IB_RETRY_CNT
      • Values accepted
    • NCCL_IB_GID_INDEX
      • Values accepted
    • NCCL_IB_SL
      • Values accepted
    • NCCL_IB_TC
      • Values accepted
    • NCCL_IB_AR_THRESHOLD
      • Values accepted
    • NCCL_IB_CUDA_SUPPORT
      • Values accepted
    • NCCL_IB_QPS_PER_CONNECTION
      • Values accepted
    • NCCL_IB_PCI_RELAXED_ORDERING
      • Values accepted
    • NCCL_IB_ADAPTIVE_ROUTING
      • Values accepted
    • NCCL_MEM_SYNC_DOMAIN
      • Values accepted
    • NCCL_NET
      • Values accepted
    • NCCL_NET_PLUGIN
      • Values accepted
    • NCCL_NET_GDR_LEVEL (formerly NCCL_IB_GDR_LEVEL)
      • Values accepted
      • Integer Values (Legacy)
    • NCCL_NET_GDR_READ
      • Values accepted
    • NCCL_NET_SHARED_BUFFERS
      • Value accepted
    • NCCL_NET_SHARED_COMMS
      • Value accepted
    • NCCL_SINGLE_RING_THRESHOLD
      • Values accepted
    • NCCL_LL_THRESHOLD
      • Values accepted
    • NCCL_TREE_THRESHOLD
      • Values accepted
    • NCCL_ALGO
      • Values accepted
    • NCCL_PROTO
      • Values accepted
    • NCCL_IGNORE_CPU_AFFINITY
      • Values accepted
    • NCCL_DEBUG_FILE
      • Values accepted
    • NCCL_DEBUG_SUBSYS
      • Values accepted
    • NCCL_COLLNET_ENABLE
      • Value accepted
    • NCCL_COLLNET_NODE_THRESHOLD
      • Value accepted
    • NCCL_TOPO_FILE
      • Value accepted
    • NCCL_TOPO_DUMP_FILE
      • Value accepted
    • NCCL_NVB_DISABLE
      • Value accepted
    • NCCL_PXN_DISABLE
      • Value accepted
    • NCCL_P2P_PXN_LEVEL
      • Value accepted
    • NCCL_GRAPH_REGISTER
      • Value accepted
    • NCCL_SET_STACK_SIZE
      • Value accepted
    • NCCL_SET_THREAD_NAME
      • Value accepted
    • NCCL_GRAPH_MIXING_SUPPORT
      • Value accepted
    • NCCL_DMABUF_ENABLE
      • Value accepted
    • NCCL_P2P_NET_CHUNKSIZE
      • Values accepted
    • NCCL_P2P_LL_THRESHOLD
      • Values accepted
    • NCCL_ALLOC_P2P_NET_LL_BUFFERS
      • Values accepted
    • NCCL_COMM_BLOCKING
      • Values accepted
    • NCCL_CGA_CLUSTER_SIZE
      • Values accepted
    • NCCL_MAX_CTAS
      • Values accepted
    • NCCL_MIN_CTAS
      • Values accepted
    • NCCL_NVLS_ENABLE
      • Values accepted
  • Troubleshooting
    • Errors
    • GPU Direct
      • GPU-to-GPU communication
      • GPU-to-NIC communication
      • PCI Access Control Services (ACS)
    • Topology detection
    • Networking issues
      • IP Network Interfaces
      • IP Ports
      • InfiniBand
    • Known Issues
      • Sharing Data
NCCL
  • Docs »
  • NCCL API
  • View page source

NCCL APIΒΆ

The following sections describe the collective communications methods and operations.

  • Communicator Creation and Management Functions
    • ncclGetLastError
    • ncclGetVersion
    • ncclGetUniqueId
    • ncclCommInitRank
    • ncclCommInitAll
    • ncclCommInitRankConfig
    • ncclCommFinalize
    • ncclCommDestroy
    • ncclCommAbort
    • ncclCommGetAsyncError
    • ncclCommCount
    • ncclCommCuDevice
    • ncclCommUserRank
  • Collective Communication Functions
    • ncclAllReduce
    • ncclBroadcast
    • ncclReduce
    • ncclAllGather
    • ncclReduceScatter
  • Group Calls
    • ncclGroupStart
    • ncclGroupEnd
  • Point To Point Communication Functions
    • ncclSend
    • ncclRecv
  • Types
    • ncclComm_t
    • ncclResult_t
    • ncclDataType_t
    • ncclRedOp_t
    • ncclScalarResidence_t
    • ncclConfig_t
  • User Defined Reduction Operators
    • ncclRedOpCreatePreMulSum
    • ncclRedOpDestroy
Next Previous

© Copyright 2020, NVIDIA Corporation

Built with Sphinx using a theme provided by Read the Docs.