NCCL
2.13
  • Overview of NCCL
  • Using NCCL
    • Creating a Communicator
      • Using multiple NCCL communicators concurrently
      • Destroying a communicator
    • Error handling and communicator abort
      • Asynchronous errors and error handling
    • Collective Operations
      • AllReduce
      • Broadcast
      • Reduce
      • AllGather
      • ReduceScatter
    • Data Pointers
    • CUDA Stream Semantics
    • Group Calls
      • Management Of Multiple GPUs From One Thread
      • Aggregated Operations (2.2 and later)
    • Point-to-point communication
      • Sendrecv
      • One-to-all (scatter)
      • All-to-one (gather)
      • All-to-all
      • Neighbor exchange
    • Thread Safety
    • In-place Operations
    • Using NCCL with CUDA Graphs
  • NCCL API
    • Communicator Creation and Management Functions
      • ncclGetLastError
      • ncclGetVersion
      • ncclGetUniqueId
      • ncclCommInitRank
      • ncclCommInitAll
      • ncclCommDestroy
      • ncclCommAbort
      • ncclCommGetAsyncError
      • ncclCommCount
      • ncclCommCuDevice
      • ncclCommUserRank
    • Collective Communication Functions
      • ncclAllReduce
      • ncclBroadcast
      • ncclReduce
      • ncclAllGather
      • ncclReduceScatter
    • Group Calls
      • ncclGroupStart
      • ncclGroupEnd
    • Point To Point Communication Functions
      • ncclSend
      • ncclRecv
    • Types
      • ncclComm_t
      • ncclResult_t
      • ncclDataType_t
      • ncclRedOp_t
      • ncclScalarResidence_t
    • User Defined Reduction Operators
      • ncclRedOpCreatePreMulSum
      • ncclRedOpDestroy
  • Migrating from NCCL 1 to NCCL 2
    • Initialization
    • Communication
    • Counts
    • In-place usage for AllGather and ReduceScatter
    • AllGather arguments order
    • Datatypes
    • Error codes
  • Examples
    • Communicator Creation and Destruction Examples
      • Example 1: Single Process, Single Thread, Multiple Devices
      • Example 2: One Device per Process or Thread
      • Example 3: Multiple Devices per Thread
    • Communication Examples
      • Example 1: One Device per Process or Thread
      • Example 2: Multiple Devices per Thread
  • NCCL and MPI
    • API
      • Using multiple devices per process
      • ReduceScatter operation
      • Send and Receive counts
      • Other collectives and point-to-point operations
      • In-place operations
    • Using NCCL within an MPI Program
      • MPI Progress
      • Inter-GPU Communication with CUDA-aware MPI
  • Environment Variables
    • NCCL_P2P_DISABLE
      • Values accepted
    • NCCL_P2P_LEVEL
      • Values accepted
      • Integer Values (Legacy)
    • NCCL_P2P_DIRECT_DISABLE
      • Values accepted
    • NCCL_SHM_DISABLE
      • Values accepted
    • NCCL_SOCKET_IFNAME
      • Values accepted
    • NCCL_SOCKET_NTHREADS
      • Values accepted
    • NCCL_NSOCKS_PERTHREAD
      • Values accepted
    • NCCL_DEBUG
      • Values accepted
    • NCCL_BUFFSIZE
      • Values accepted
    • NCCL_NTHREADS
      • Values accepted
    • NCCL_MAX_NCHANNELS
      • Values accepted
    • NCCL_MIN_NCHANNELS
      • Values accepted
    • NCCL_CROSS_NIC
      • Values accepted
    • NCCL_CHECKS_DISABLE
      • Values accepted
    • NCCL_CHECK_POINTERS
      • Values accepted
    • NCCL_LAUNCH_MODE
      • Values accepted
    • NCCL_IB_DISABLE
      • Values accepted
    • NCCL_IB_HCA
      • Values accepted
    • NCCL_IB_TIMEOUT
      • Values accepted
    • NCCL_IB_RETRY_CNT
      • Values accepted
    • NCCL_IB_GID_INDEX
      • Values accepted
    • NCCL_IB_SL
      • Values accepted
    • NCCL_IB_TC
      • Values accepted
    • NCCL_IB_AR_THRESHOLD
      • Values accepted
    • NCCL_IB_CUDA_SUPPORT
      • Values accepted
    • NCCL_IB_QPS_PER_CONNECTION
      • Values accepted
    • NCCL_IB_PCI_RELAXED_ORDERING
      • Values accepted
    • NCCL_NET
      • Values accepted
    • NCCL_NET_PLUGIN
      • Values accepted
    • NCCL_NET_GDR_LEVEL (formerly NCCL_IB_GDR_LEVEL)
      • Values accepted
      • Integer Values (Legacy)
    • NCCL_NET_GDR_READ
      • Values accepted
    • NCCL_NET_SHARED_BUFFERS
      • Value accepted
    • NCCL_NET_SHARED_COMMS
      • Value accepted
    • NCCL_SINGLE_RING_THRESHOLD
      • Values accepted
    • NCCL_LL_THRESHOLD
      • Values accepted
    • NCCL_TREE_THRESHOLD
      • Values accepted
    • NCCL_ALGO
      • Values accepted
    • NCCL_PROTO
      • Values accepted
    • NCCL_IGNORE_CPU_AFFINITY
      • Values accepted
    • NCCL_DEBUG_FILE
      • Values accepted
    • NCCL_DEBUG_SUBSYS
      • Values accepted
    • NCCL_COLLNET_ENABLE
      • Value accepted
    • NCCL_COLLNET_NODE_THRESHOLD
      • Value accepted
    • NCCL_TOPO_FILE
      • Value accepted
    • NCCL_TOPO_DUMP_FILE
      • Value accepted
    • NCCL_NVB_DISABLE
      • Value accepted
    • NCCL_PXN_DISABLE
      • Value accepted
    • NCCL_P2P_PXN_LEVEL
      • Value accepted
    • NCCL_GRAPH_REGISTER
      • Value accepted
    • NCCL_SET_STACK_SIZE
      • Value accepted
    • NCCL_SET_THREAD_NAME
      • Value accepted
    • NCCL_GRAPH_MIXING_SUPPORT
      • Value accepted
    • NCCL_DMABUF_ENABLE
      • Value accepted
  • Troubleshooting
    • Errors
    • GPU Direct
      • GPU-to-GPU communication
      • GPU-to-NIC communication
      • PCI Access Control Services (ACS)
    • Topology detection
    • Networking issues
      • IP Network Interfaces
      • IP Ports
      • InfiniBand
    • Known Issues
      • Sharing Data
NCCL
  • Docs »
  • Index

Index

N

N

  • ncclAllGather (C function)
  • ncclAllReduce (C function)
  • ncclAvg (C macro)
  • ncclBcast (C function)
  • ncclBfloat16 (C macro)
  • ncclBroadcast (C function)
  • ncclChar (C macro)
  • ncclComm_t (C type)
  • ncclCommAbort (C function)
  • ncclCommCount (C function)
  • ncclCommCuDevice (C function)
  • ncclCommDestroy (C function)
  • ncclCommGetAsyncError (C function)
  • ncclCommInitAll (C function)
  • ncclCommInitRank (C function)
  • ncclCommUserRank (C function)
  • ncclDataType_t (C type)
  • ncclDouble (C macro)
  • ncclFloat (C macro)
  • ncclFloat16 (C macro)
  • ncclFloat32 (C macro)
  • ncclFloat64 (C macro)
  • ncclGetLastError (C function)
  • ncclGetUniqueId (C function)
  • ncclGetVersion (C function)
  • ncclGroupEnd (C function)
  • ncclGroupStart (C function)
  • ncclHalf (C macro)
  • ncclInt (C macro)
  • ncclInt32 (C macro)
  • ncclInt64 (C macro)
  • ncclInt8 (C macro)
  • ncclInternalError (C macro)
  • ncclInvalidArgument (C macro)
  • ncclInvalidUsage (C macro)
  • ncclMax (C macro)
  • ncclMin (C macro)
  • ncclProd (C macro)
  • ncclRecv (C function)
  • ncclRedOp_t (C type)
  • ncclRedOpCreatePreMulSum (C function)
  • ncclRedOpDestroy (C function)
  • ncclReduce (C function)
  • ncclReduceScatter (C function)
  • ncclRemoteError (C macro)
  • ncclResult_t (C type)
  • ncclScalarDevice (C macro)
  • ncclScalarHostImmediate (C macro)
  • ncclScalarResidence_t (C type)
  • ncclSend (C function)
  • ncclSuccess (C macro)
  • ncclSum (C macro)
  • ncclSystemError (C macro)
  • ncclUint32 (C macro)
  • ncclUint64 (C macro)
  • ncclUint8 (C macro)
  • ncclUnhandledCudaError (C macro)

© Copyright 2020, NVIDIA Corporation.

Built with Sphinx using a theme provided by Read the Docs.