NCCL
2.4
  • Overview of NCCL
  • Using NCCL
    • Creating a Communicator
    • Error handling and communicator destruction
      • Normal termination
      • Asynchronous errors and error handling
    • Operations
      • AllReduce
      • Broadcast
      • Reduce
      • AllGather
      • ReduceScatter
    • Data Pointers
    • CUDA Stream Semantics
    • Group Calls
      • Management Of Multiple GPUs From One Thread
      • Aggregated Operations (2.2 and later)
    • Thread Safety
    • In-place Operations
  • NCCL API
    • Communicator Creation and Management Functions
      • ncclGetVersion
      • ncclGetUniqueId
      • ncclCommInitRank
      • ncclCommInitAll
      • ncclCommDestroy
      • ncclCommAbort
      • ncclCommGetAsyncError
      • ncclCommCount
      • ncclCommCuDevice
      • ncclCommUserRank
    • Collective Communication Functions
      • ncclAllReduce
      • ncclBroadcast
      • ncclReduce
      • ncclAllGather
      • ncclReduceScatter
    • Group Calls
      • ncclGroupStart
      • ncclGroupEnd
    • Types
      • ncclComm_t
      • ncclResult_t
      • ncclDataType_t
      • ncclRedOp_t
  • Migrating from NCCL 1 to NCCL 2
    • Initialization
    • Communication
    • Counts
    • In-place usage for AllGather and ReduceScatter
    • AllGather arguments order
    • Datatypes
    • Error codes
  • Examples
    • Communicator Creation and Destruction Examples
      • Example 1: Single Process, Single Thread, Multiple Devices
      • Example 2: One Device per Process or Thread
      • Example 3: Multiple Devices per Thread
    • Communication Examples
      • Example 1: One Device per Process or Thread
      • Example 2: Multiple Devices per Thread
  • NCCL and MPI
    • API
      • Using multiple devices per process
      • ReduceScatter operation
      • Send and Receive counts
      • In-place operations
    • Using NCCL within an MPI Program
      • MPI Progress
      • Inter-GPU Communication with CUDA-aware MPI
  • Environment Variables
    • NCCL_P2P_DISABLE
      • Values accepted
    • NCCL_P2P_LEVEL
      • Values accepted
    • NCCL_SHM_DISABLE
      • Values accepted
    • NCCL_SOCKET_IFNAME
      • Values accepted
    • NCCL_DEBUG
      • Values accepted
    • NCCL_BUFFSIZE
      • Values accepted
    • NCCL_NTHREADS
      • Values accepted
    • NCCL_RINGS
      • Values accepted
    • NCCL_MAX_NRINGS
      • Values accepted
    • NCCL_MIN_NRINGS
      • Values accepted
    • NCCL_CHECKS_DISABLE
      • Values accepted
    • NCCL_CHECK_POINTERS
      • Values accepted
    • NCCL_LAUNCH_MODE
      • Values accepted
    • NCCL_IB_DISABLE
      • Values accepted
    • NCCL_IB_HCA
      • Values accepted
    • NCCL_IB_TIMEOUT
      • Values accepted
    • NCCL_IB_RETRY_CNT
      • Values accepted
    • NCCL_IB_GID_INDEX
      • Values accepted
    • NCCL_IB_SL
      • Values accepted
    • NCCL_IB_TC
      • Values accepted
    • NCCL_IB_CUDA_SUPPORT
      • Values accepted
    • NCCL_NET_GDR_LEVEL (formerly NCCL_IB_GDR_LEVEL)
      • Values accepted
    • NCCL_NET_GDR_READ
      • Values accepted
    • NCCL_SINGLE_RING_THRESHOLD
      • Values accepted
    • NCCL_LL_THRESHOLD
      • Values accepted
    • NCCL_TREE_THRESHOLD
      • Values accepted
    • NCCL_IGNORE_CPU_AFFINITY
      • Values accepted
    • NCCL_DEBUG_FILE
      • Values accepted
    • NCCL_DEBUG_SUBSYS
      • Values accepted
  • Troubleshooting
    • Errors
    • Networking issues
      • IP Network Interfaces
      • InfiniBand
    • Known Issues
      • Sharing Data
      • Concurrency between NCCL and CUDA calls (NCCL up to 2.0.5 or CUDA 8)
NCCL
  • Docs »

Index

N

N

ncclAllGather (C function)
ncclAllReduce (C function)
ncclBcast (C function)
ncclBroadcast (C function)
ncclChar (C macro)
ncclComm_t (C type)
ncclCommAbort (C function)
ncclCommCount (C function)
ncclCommCuDevice (C function)
ncclCommDestroy (C function)
ncclCommGetAsyncError (C function)
ncclCommInitAll (C function)
ncclCommInitRank (C function)
ncclCommUserRank (C function)
ncclDataType_t (C type)
ncclDouble (C macro)
ncclFloat (C macro)
ncclFloat16 (C macro)
ncclFloat32 (C macro)
ncclFloat64 (C macro)
ncclGetUniqueId (C function)
ncclGetVersion (C function)
ncclGroupEnd (C function)
ncclGroupStart (C function)
ncclHalf (C macro)
ncclInt (C macro)
ncclInt32 (C macro)
ncclInt64 (C macro)
ncclInt8 (C macro)
ncclInternalError (C macro)
ncclInvalidArgument (C macro)
ncclInvalidUsage (C macro)
ncclMax (C macro)
ncclMin (C macro)
ncclProd (C macro)
ncclRedOp_t (C type)
ncclReduce (C function)
ncclReduceScatter (C function)
ncclResult_t (C type)
ncclSuccess (C macro)
ncclSum (C macro)
ncclSystemError (C macro)
ncclUint32 (C macro)
ncclUint64 (C macro)
ncclUint8 (C macro)
ncclUnhandledCudaError (C macro)

© Copyright 2019, NVIDIA Corporation.

Built with Sphinx using a theme provided by Read the Docs.