NCCL
2.26
  • Overview of NCCL
  • Setup
  • Using NCCL
    • Creating a Communicator
      • Creating a communicator with options
      • Creating a communicator using multiple ncclUniqueIds
      • Creating more communicators
      • Using multiple NCCL communicators concurrently
      • Finalizing a communicator
      • Destroying a communicator
    • Error handling and communicator abort
      • Asynchronous errors and error handling
    • Fault Tolerance
    • Quality of Service
    • Collective Operations
      • AllReduce
      • Broadcast
      • Reduce
      • AllGather
      • ReduceScatter
    • Data Pointers
    • CUDA Stream Semantics
      • Mixing Multiple Streams within the same ncclGroupStart/End() group
    • Group Calls
      • Management Of Multiple GPUs From One Thread
      • Aggregated Operations (2.2 and later)
      • Nonblocking Group Operation
    • Point-to-point communication
      • Sendrecv
      • One-to-all (scatter)
      • All-to-one (gather)
      • All-to-all
      • Neighbor exchange
    • Thread Safety
    • In-place Operations
    • Using NCCL with CUDA Graphs
    • User Buffer Registration
      • NVLink Sharp Buffer Registration
      • IB Sharp Buffer Registration
      • General Buffer Registration
      • Memory Allocator
  • NCCL API
    • Communicator Creation and Management Functions
      • ncclGetLastError
      • ncclGetErrorString
      • ncclGetVersion
      • ncclGetUniqueId
      • ncclCommInitRank
      • ncclCommInitAll
      • ncclCommInitRankConfig
      • ncclCommInitRankScalable
      • ncclCommSplit
      • ncclCommFinalize
      • ncclCommDestroy
      • ncclCommAbort
      • ncclCommGetAsyncError
      • ncclCommCount
      • ncclCommCuDevice
      • ncclCommUserRank
      • ncclCommRegister
      • ncclCommDeregister
      • ncclMemAlloc
      • ncclMemFree
    • Collective Communication Functions
      • ncclAllReduce
      • ncclBroadcast
      • ncclReduce
      • ncclAllGather
      • ncclReduceScatter
    • Group Calls
      • ncclGroupStart
      • ncclGroupEnd
      • ncclGroupSimulateEnd
    • Point To Point Communication Functions
      • ncclSend
      • ncclRecv
    • Types
      • ncclComm_t
      • ncclResult_t
      • ncclDataType_t
      • ncclRedOp_t
      • ncclScalarResidence_t
      • ncclConfig_t
      • ncclSimInfo_t
    • User Defined Reduction Operators
      • ncclRedOpCreatePreMulSum
      • ncclRedOpDestroy
  • Migrating from NCCL 1 to NCCL 2
    • Initialization
    • Communication
    • Counts
    • In-place usage for AllGather and ReduceScatter
    • AllGather arguments order
    • Datatypes
    • Error codes
  • Examples
    • Communicator Creation and Destruction Examples
      • Example 1: Single Process, Single Thread, Multiple Devices
      • Example 2: One Device per Process or Thread
      • Example 3: Multiple Devices per Thread
      • Example 4: Multiple communicators per device
    • Communication Examples
      • Example 1: One Device per Process or Thread
      • Example 2: Multiple Devices per Thread
  • NCCL and MPI
    • API
      • Using multiple devices per process
      • ReduceScatter operation
      • Send and Receive counts
      • Other collectives and point-to-point operations
      • In-place operations
    • Using NCCL within an MPI Program
      • MPI Progress
      • Inter-GPU Communication with CUDA-aware MPI
  • Environment Variables
    • System configuration
      • NCCL_SOCKET_IFNAME
        • Values accepted
      • NCCL_SOCKET_FAMILY
        • Values accepted
      • NCCL_SOCKET_RETRY_CNT
        • Values accepted
      • NCCL_SOCKET_RETRY_SLEEP_MSEC
        • Values accepted
      • NCCL_SOCKET_NTHREADS
        • Values accepted
      • NCCL_NSOCKS_PERTHREAD
        • Values accepted
      • NCCL_CROSS_NIC
        • Values accepted
      • NCCL_IB_HCA
        • Values accepted
      • NCCL_IB_TIMEOUT
        • Values accepted
      • NCCL_IB_RETRY_CNT
        • Values accepted
      • NCCL_IB_GID_INDEX
        • Values accepted
      • NCCL_IB_ADDR_FAMILY
        • Values accepted
      • NCCL_IB_ADDR_RANGE
        • Values accepted
      • NCCL_IB_ROCE_VERSION_NUM
        • Values accepted
      • NCCL_IB_SL
        • Values accepted
      • NCCL_IB_TC
        • Values accepted
      • NCCL_IB_FIFO_TC
        • Values accepted
      • NCCL_IB_RETURN_ASYNC_EVENTS
        • Values accepted
      • NCCL_OOB_NET_ENABLE
        • Values accepted
      • NCCL_OOB_NET_IFNAME
        • Values accepted
      • NCCL_UID_STAGGER_THRESHOLD
        • Values accepted
      • NCCL_UID_STAGGER_RATE
        • Values accepted
      • NCCL_NET
        • Values accepted
      • NCCL_NET_PLUGIN
        • Values accepted
      • NCCL_TUNER_PLUGIN
        • Values accepted
      • NCCL_PROFILER_PLUGIN
        • Values accepted
      • NCCL_IGNORE_CPU_AFFINITY
        • Values accepted
      • NCCL_CONF_FILE
        • Values accepted
      • NCCL_DEBUG
        • Values accepted
      • NCCL_DEBUG_FILE
        • Values accepted
      • NCCL_DEBUG_SUBSYS
        • Values accepted
      • NCCL_DEBUG_TIMESTAMP_FORMAT
        • Value accepted
      • NCCL_DEBUG_TIMESTAMP_LEVELS
        • Value accepted
      • NCCL_COLLNET_ENABLE
        • Value accepted
      • NCCL_COLLNET_NODE_THRESHOLD
        • Value accepted
      • NCCL_TOPO_FILE
        • Value accepted
      • NCCL_TOPO_DUMP_FILE
        • Value accepted
      • NCCL_SET_THREAD_NAME
        • Value accepted
    • Debugging
      • NCCL_P2P_DISABLE
        • Values accepted
      • NCCL_P2P_LEVEL
        • Values accepted
        • Integer Values (Legacy)
      • NCCL_P2P_DIRECT_DISABLE
        • Values accepted
      • NCCL_SHM_DISABLE
        • Values accepted
      • NCCL_BUFFSIZE
        • Values accepted
      • NCCL_NTHREADS
        • Values accepted
      • NCCL_MAX_NCHANNELS
        • Values accepted
      • NCCL_MIN_NCHANNELS
        • Values accepted
      • NCCL_CHECKS_DISABLE
        • Values accepted
      • NCCL_CHECK_POINTERS
        • Values accepted
      • NCCL_LAUNCH_MODE
        • Values accepted
      • NCCL_IB_DISABLE
        • Values accepted
      • NCCL_IB_AR_THRESHOLD
        • Values accepted
      • NCCL_IB_QPS_PER_CONNECTION
        • Values accepted
      • NCCL_IB_SPLIT_DATA_ON_QPS
        • Values accepted
      • NCCL_IB_CUDA_SUPPORT
        • Values accepted
      • NCCL_IB_PCI_RELAXED_ORDERING
        • Values accepted
      • NCCL_IB_ADAPTIVE_ROUTING
        • Values accepted
      • NCCL_IB_ECE_ENABLE
        • Values accepted
      • NCCL_MEM_SYNC_DOMAIN
        • Values accepted
      • NCCL_CUMEM_ENABLE
        • Values accepted
      • NCCL_CUMEM_HOST_ENABLE
        • Values accepted
      • NCCL_NET_GDR_LEVEL (formerly NCCL_IB_GDR_LEVEL)
        • Values accepted
        • Integer Values (Legacy)
      • NCCL_NET_GDR_C2C
        • Values accepted
      • NCCL_NET_GDR_READ
        • Values accepted
      • NCCL_NET_SHARED_BUFFERS
        • Value accepted
      • NCCL_NET_SHARED_COMMS
        • Value accepted
      • NCCL_SINGLE_RING_THRESHOLD
        • Values accepted
      • NCCL_LL_THRESHOLD
        • Values accepted
      • NCCL_TREE_THRESHOLD
        • Values accepted
      • NCCL_ALGO
        • Values accepted
      • NCCL_PROTO
        • Values accepted
      • NCCL_NVB_DISABLE
        • Value accepted
      • NCCL_PXN_DISABLE
        • Value accepted
      • NCCL_P2P_PXN_LEVEL
        • Value accepted
      • NCCL_RUNTIME_CONNECT
        • Value accepted
      • NCCL_GRAPH_REGISTER
        • Value accepted
      • NCCL_LOCAL_REGISTER
        • Value accepted
      • NCCL_LEGACY_CUDA_REGISTER
        • Value accepted
      • NCCL_SET_STACK_SIZE
        • Value accepted
      • NCCL_GRAPH_MIXING_SUPPORT
        • Value accepted
      • NCCL_DMABUF_ENABLE
        • Value accepted
      • NCCL_P2P_NET_CHUNKSIZE
        • Values accepted
      • NCCL_P2P_LL_THRESHOLD
        • Values accepted
      • NCCL_ALLOC_P2P_NET_LL_BUFFERS
        • Values accepted
      • NCCL_COMM_BLOCKING
        • Values accepted
      • NCCL_CGA_CLUSTER_SIZE
        • Values accepted
      • NCCL_MAX_CTAS
        • Values accepted
      • NCCL_MIN_CTAS
        • Values accepted
      • NCCL_NVLS_ENABLE
        • Values accepted
      • NCCL_IB_MERGE_NICS
        • Values accepted
      • NCCL_MNNVL_ENABLE
        • Values accepted
      • NCCL_RAS_ENABLE
        • Values accepted
      • NCCL_RAS_ADDR
        • Values accepted
      • NCCL_RAS_TIMEOUT_FACTOR
        • Values accepted
      • NCCL_LAUNCH_ORDER_IMPLICIT
        • Values accepted
      • NCCL_LAUNCH_RACE_FATAL
        • Values accepted
  • Troubleshooting
    • Errors
    • RAS
      • RAS
        • Principle of Operation
        • RAS Queries
        • Sample Output
    • GPU Direct
      • GPU-to-GPU communication
      • GPU-to-NIC communication
      • PCI Access Control Services (ACS)
    • Topology detection
    • Shared memory
      • Docker
      • Systemd
      • cuMem host allocations
    • Networking issues
      • IP Network Interfaces
      • IP Ports
      • InfiniBand
      • RDMA over Converged Ethernet (RoCE)
NCCL
  • Docs »
  • Search


© Copyright 2020, NVIDIA Corporation

Built with Sphinx using a theme provided by Read the Docs.