NCCL
2.4
  • Overview of NCCL
  • Using NCCL
    • Creating a Communicator
      • Using multiple NCCL communicators concurrently
    • Error handling and communicator destruction
      • Normal termination
      • Asynchronous errors and error handling
    • Operations
      • AllReduce
      • Broadcast
      • Reduce
      • AllGather
      • ReduceScatter
    • Data Pointers
    • CUDA Stream Semantics
    • Group Calls
      • Management Of Multiple GPUs From One Thread
      • Aggregated Operations (2.2 and later)
    • Thread Safety
    • In-place Operations
  • NCCL API
    • Communicator Creation and Management Functions
      • ncclGetVersion
      • ncclGetUniqueId
      • ncclCommInitRank
      • ncclCommInitAll
      • ncclCommDestroy
      • ncclCommAbort
      • ncclCommGetAsyncError
      • ncclCommCount
      • ncclCommCuDevice
      • ncclCommUserRank
    • Collective Communication Functions
      • ncclAllReduce
      • ncclBroadcast
      • ncclReduce
      • ncclAllGather
      • ncclReduceScatter
    • Group Calls
      • ncclGroupStart
      • ncclGroupEnd
    • Types
      • ncclComm_t
      • ncclResult_t
      • ncclDataType_t
      • ncclRedOp_t
  • Migrating from NCCL 1 to NCCL 2
    • Initialization
    • Communication
    • Counts
    • In-place usage for AllGather and ReduceScatter
    • AllGather arguments order
    • Datatypes
    • Error codes
  • Examples
    • Communicator Creation and Destruction Examples
      • Example 1: Single Process, Single Thread, Multiple Devices
      • Example 2: One Device per Process or Thread
      • Example 3: Multiple Devices per Thread
    • Communication Examples
      • Example 1: One Device per Process or Thread
      • Example 2: Multiple Devices per Thread
  • NCCL and MPI
    • API
      • Using multiple devices per process
      • ReduceScatter operation
      • Send and Receive counts
      • In-place operations
    • Using NCCL within an MPI Program
      • MPI Progress
      • Inter-GPU Communication with CUDA-aware MPI
  • Environment Variables
    • NCCL_P2P_DISABLE
      • Values accepted
    • NCCL_P2P_LEVEL
      • Values accepted
    • NCCL_SHM_DISABLE
      • Values accepted
    • NCCL_SOCKET_IFNAME
      • Values accepted
    • NCCL_SOCKET_NTHREADS
      • Values accepted
    • NCCL_NSOCKS_PERTHREAD
      • Values accepted
    • NCCL_DEBUG
      • Values accepted
    • NCCL_BUFFSIZE
      • Values accepted
    • NCCL_NTHREADS
      • Values accepted
    • NCCL_RINGS
      • Values accepted
    • NCCL_MAX_NRINGS
      • Values accepted
    • NCCL_MIN_NRINGS
      • Values accepted
    • NCCL_CHECKS_DISABLE
      • Values accepted
    • NCCL_CHECK_POINTERS
      • Values accepted
    • NCCL_LAUNCH_MODE
      • Values accepted
    • NCCL_IB_DISABLE
      • Values accepted
    • NCCL_IB_HCA
      • Values accepted
    • NCCL_IB_TIMEOUT
      • Values accepted
    • NCCL_IB_RETRY_CNT
      • Values accepted
    • NCCL_IB_GID_INDEX
      • Values accepted
    • NCCL_IB_SL
      • Values accepted
    • NCCL_IB_TC
      • Values accepted
    • NCCL_IB_CUDA_SUPPORT
      • Values accepted
    • NCCL_NET_GDR_LEVEL (formerly NCCL_IB_GDR_LEVEL)
      • Values accepted
    • NCCL_NET_GDR_READ
      • Values accepted
    • NCCL_SINGLE_RING_THRESHOLD
      • Values accepted
    • NCCL_LL_THRESHOLD
      • Values accepted
    • NCCL_TREE_THRESHOLD
      • Values accepted
    • NCCL_IGNORE_CPU_AFFINITY
      • Values accepted
    • NCCL_DEBUG_FILE
      • Values accepted
    • NCCL_DEBUG_SUBSYS
      • Values accepted
  • Troubleshooting
    • Errors
    • Networking issues
      • IP Network Interfaces
      • InfiniBand
    • Known Issues
      • Sharing Data
      • Concurrency between NCCL and CUDA calls (NCCL up to 2.0.5 or CUDA 8)
NCCL
  • Docs »


© Copyright 2019, NVIDIA Corporation.

Built with Sphinx using a theme provided by Read the Docs.