Logging
NCCL provides configurable logging to help diagnose issues, understand runtime behavior and gain insight into the choices NCCL makes during execution (such as algorithm selection, topology detection, and network configuration).
Logging Environment Variables
The following environment variables control NCCL logging behavior:
Variable |
Description |
|---|---|
Sets verbosity: VERSION, WARN, INFO, TRACE |
|
Filters log output by subsystem |
|
Writes logs to files (%h host, %p PID) |
|
Sets timestamp format (strftime syntax) |
|
Selects which levels include timestamps |
Logging Levels
NCCL supports several logging levels, from least to most verbose:
Level |
Description |
Use Case |
|---|---|---|
VERSION |
Prints NCCL version at startup |
Verify installation |
WARN |
Warnings and errors |
Production minimum |
INFO |
Detailed operational information |
Diagnose runtime issues |
TRACE |
Replayable traces, plus CALL APIs |
Deep debugging / NCCL dev |
Setting the Logging Level
Set the NCCL_DEBUG environment variable:
NCCL_DEBUG=INFO ./my_app
Example Output
The snippets below are excerpts from NCCL logs; only the relevant lines are shown. Exact values, line numbers, and formatting can vary by NCCL version and environment.
NCCL_DEBUG=VERSION - Prints NCCL version at startup:
NCCL version 2.30.3+cuda13.0
NCCL_DEBUG=WARN - Warnings and errors are printed:
[2026-05-05 06:16:47] node-01:189884:189884 [3] plugin/net.cc:334 NCCL WARN Failed to initialize any NET plugin
NCCL_DEBUG=INFO - Detailed information about NCCL operations:
node-01:3873285:3873285 [0] NCCL INFO Initialized NET plugin IB
node-01:3873285:3873285 [0] NCCL INFO Assigned NET plugin IB to comm
node-01:3873285:3873285 [0] NCCL INFO Assigned GIN plugin GIN_XXXX to comm
node-01:3873285:3873285 [0] NCCL INFO Assigned RMA plugin RMA_XXXX to comm
node-01:3873285:3873285 [0] NCCL INFO Using network IB
node-01:3873285:3873285 [0] NCCL INFO DMA-BUF is available on GPU device 0
NCCL_DEBUG=TRACE NCCL_DEBUG_SUBSYS=CALL - Function call tracing:
node-01:986207:986207 NCCL CALL ncclGroupStart()
node-01:986207:986207 NCCL CALL ncclSend(0,0x...,8388608,7,0,2,0x...,0x...)
node-01:986207:986207 NCCL CALL ncclRecv(0,0x...,8388608,7,0,0,0x...,0x...)
node-01:986207:986207 NCCL CALL ncclGroupEnd()
Filtering by Subsystem
When using NCCL_DEBUG=INFO or NCCL_DEBUG=TRACE, output can be filtered to include specific
subsystems using NCCL_DEBUG_SUBSYS. This helps focus on relevant information without
being overwhelmed by unrelated messages.
Basic Usage
# Only show network-related messages
NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=NET ./my_app
# Show multiple subsystems
NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=INIT,NET,GRAPH ./my_app
# Show everything except verbose proxy messages
NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=^PROXY ./my_app
# Exclude multiple subsystems
NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=^PROXY,ALLOC ./my_app
Available Subsystems
Subsystem |
Description |
|---|---|
INIT |
Initialization and setup |
COLL |
Collective operations (AllReduce, Broadcast, etc.) |
P2P |
Peer-to-peer send/receive operations |
SHM |
Shared memory transport |
NET |
Network transport (IB, sockets, etc.) |
GRAPH |
Topology detection and graph search |
TUNING |
Algorithm and protocol selection |
ENV |
Environment variable processing |
ALLOC |
Memory allocations |
CALL |
Function call tracing |
PROXY |
Proxy thread operations |
NVLS |
NVLink SHARP operations |
BOOTSTRAP |
Early initialization and bootstrapping |
REG |
Buffer registration |
PROFILE |
Coarse-grained initialization profiling |
RAS |
Reliability, availability, and serviceability |
DESTROY |
Communicator destroy, abort, revoke, and plugin unload/close operations |
ALL |
All subsystems |
The default subsystems (when NCCL_DEBUG_SUBSYS is not set) are INIT,BOOTSTRAP,ENV.
Example Output by Subsystem
The following examples show typical output for each subsystem (output is truncated for brevity):
INIT - Shows initialization and plugin assignment:
node-01:1895902:1895913 [0] NCCL INFO Initialized NET plugin IB
node-01:1895902:1895913 [0] NCCL INFO Assigned NET plugin IB to comm
node-01:1895902:1895913 [0] NCCL INFO Assigned GIN plugin GIN_XXXX to comm
node-01:1895902:1895913 [0] NCCL INFO Assigned RMA plugin RMA_XXXX to comm
node-01:1895902:1895913 [0] NCCL INFO Using network IB
node-01:1895902:1895913 [0] NCCL INFO DMA-BUF is available on GPU device 0
node-01:1895902:1895913 [0] NCCL INFO [Rank 0] ncclCommInitRankConfig comm 0x... rank 0 nranks 2 cudaDev 0 nvmlDev 0 busId XXXX commId 0x... - Init START
node-01:1895902:1895913 [0] NCCL INFO ncclTopoGetCpuAffinity: Affinity for GPU 0 is X,X,X,X,X,X,X,X. (GPU affinity = X,X,X,X,X,X,X,X ; CPU affinity = X-X).
GRAPH - Shows topology detection and communication patterns:
node-01:1895811:1895822 [0] NCCL INFO Tree 0 : -1 -> 0 -> 1/-1/-1
node-01:1895811:1895822 [0] NCCL INFO Tree 1 : -1 -> 0 -> 1/-1/-1
node-01:1895811:1895822 [0] NCCL INFO Ring 00 : 1 -> 0 -> 1
node-01:1895811:1895822 [0] NCCL INFO Ring 01 : 1 -> 0 -> 1
COLL - Shows collective operation details:
node-01:1896279:1896279 [0] NCCL INFO AllReduce: opCount 0 sendbuff 0x... recvbuff 0x... count 2 datatype 7 op 0 root 0 comm 0x... [nranks=2] stream 0x...
node-01:1896279:1896279 [0] NCCL INFO AllReduce: opCount 0 sendbuff 0x... recvbuff 0x... count 2 datatype 7 op 0 root 0 comm 0x... [nranks=2] stream 0x...
SHM - Shows shared memory transport operations:
node-01:1896458:1896477 [0] NCCL INFO MMAP allocated shareable host buffer /dev/shm/nccl-XXXXXX size 4096 ptr 0x...
node-01:1896458:1896480 [0] NCCL INFO Channel 00 : 0[0] -> 1[1] via SHM/direct/direct
node-01:1896458:1896477 [0] NCCL INFO MMAP allocated shareable host buffer /dev/shm/nccl-XXXXXX size 4096 ptr 0x...
node-01:1896458:1896480 [0] NCCL INFO Channel 01 : 0[0] -> 1[1] via SHM/direct/direct
RAS - Shows reliability, availability, and serviceability information:
node-01:1896503:1896518 [0] NCCL INFO RAS thread started
node-01:1896503:1896518 [0] NCCL INFO RAS handling local addRanks request (old nRasPeers 0)
node-01:1896503:1896518 [0] NCCL INFO RAS finished local processing of addRanks request (new nRasPeers 1, nRankPeers 1)
DESTROY - Shows communicator teardown and plugin unload/close operations:
node-01:2056033:2056033 [1] NCCL INFO comm 0x... rank 1 nranks 4 cudaDev 1 busId XXXX - Destroy COMPLETE
node-01:2056035:2056035 [3] NCCL INFO comm 0x... rank 3 nranks 4 cudaDev 3 busId XXXX - Destroy COMPLETE
node-01:2056032:2056032 [0] NCCL INFO comm 0x... rank 0 nranks 4 cudaDev 0 busId XXXX - Destroy COMPLETE
node-01:2056034:2056034 [2] NCCL INFO comm 0x... rank 2 nranks 4 cudaDev 2 busId XXXX - Destroy COMPLETE
node-01:2056035:2056035 [3] NCCL INFO ENV/Plugin: Closing env plugin ncclEnvDefault
node-01:2056033:2056033 [1] NCCL INFO ENV/Plugin: Closing env plugin ncclEnvDefault
node-01:2056034:2056034 [2] NCCL INFO ENV/Plugin: Closing env plugin ncclEnvDefault
node-01:2056032:2056032 [0] NCCL INFO ENV/Plugin: Closing env plugin ncclEnvDefault
PROXY - Shows proxy thread operations:
node-01:1896541:1896559 [0] NCCL INFO New proxy recv connection 0 from local rank 0, transport 1
node-01:1896541:1896559 [0] NCCL INFO proxyProgressAsync opId=0x... op.type=1 op.reqBuff=0x... op.respSize=16 done
node-01:1896541:1896562 [0] NCCL INFO ncclPollProxyResponse Received new opId=0x...
node-01:1896541:1896562 [0] NCCL INFO resp.opId=0x... matches expected opId=0x...
node-01:1896541:1896562 [0] NCCL INFO Connected to proxy localRank 0 -> connection 0x...
node-01:1896541:1896559 [0] NCCL INFO Received and initiated operation=Init res=0
ENV - Shows environment variable processing:
node-01:1896826:1896838 [0] NCCL INFO NCCL_SHM_DISABLE set by environment to 1.
REG - Shows buffer registration:
node-01:1896963:1896984 [0] NCCL INFO register comm 0x... buffer 0x... size 8
node-01:1896963:1896984 [0] NCCL INFO register comm 0x... buffer 0x... size 8
ALLOC - Shows memory allocations:
node-01:1120933:1120933 [0] NCCL INFO init.cc:2353 Cuda Host Alloc Size 4 pointer 0x...
node-01:1120933:1120933 [0] NCCL INFO MemManager: Initialized for device 0
node-01:1120933:1120933 [0] NCCL INFO misc/utils.cc:297 memory stack hunk malloc(65536)
node-01:1120933:1120933 [0] NCCL INFO Mem Realloc old size 0, new size 256 pointer 0x...
node-01:1120933:1120950 [0] NCCL INFO Mem Realloc old size 0, new size 32 pointer 0x...
node-01:1120933:1120933 [0] NCCL INFO channel.cc:43 Cuda CallocAsync Size 608 pointer 0x... memType 2
node-01:1120933:1120933 [0] NCCL INFO channel.cc:46 Cuda CallocAsync Size 24 pointer 0x... memType 2
node-01:1120933:1120933 [0] NCCL INFO channel.cc:58 Cuda CallocAsync Size 4 pointer 0x... memType 2
node-01:1120933:1120933 [0] NCCL INFO channel.cc:43 Cuda CallocAsync Size 608 pointer 0x... memType 2
node-01:1120933:1120933 [0] NCCL INFO channel.cc:46 Cuda CallocAsync Size 24 pointer 0x... memType 2
node-01:1120933:1120933 [0] NCCL INFO channel.cc:58 Cuda CallocAsync Size 4 pointer 0x... memType 2
CALL - Shows function call tracing (requires NCCL_DEBUG=TRACE):
node-01:1897248:1897248 NCCL CALL ncclGroupStart()
node-01:1897248:1897248 NCCL CALL ncclAllReduce(0x...,0x...,2,7,0,0,0x...,0x...)
node-01:1897248:1897248 NCCL CALL ncclAllReduce(0x...,0x...,2,7,0,0,0x...,0x...)
node-01:1897248:1897248 NCCL CALL ncclGroupEnd()
BOOTSTRAP - Shows early initialization and bootstrapping:
node-01:1897326:1897337 [0] NCCL INFO Bootstrap timings total 0.001447 (create 0.000081, send 0.000255, recv 0.000268, ring 0.000068, delay 0.000000)
PROFILE - Shows coarse-grained initialization profiling:
node-01:1897505:1897517 [0] NCCL INFO Bootstrap timings total 0.001204 (create 0.000080, send 0.000303, recv 0.000137, ring 0.000063, delay 0.000000)
node-01:1897505:1897517 [0] NCCL INFO Init timings - ncclCommInitRankConfig: rank 0 nranks 2 total 0.37 (kernels 0.28, alloc 0.05, bootstrap 0.00, allgathers 0.00, topo 0.03, graphs 0.00, connections 0.01, rest 0.00)
NET - Shows network transport operations:
node-01:1897709:1897731 [0] NCCL INFO Connected to proxy localRank 0 -> connection 0x...
node-01:1897709:1897731 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] [send] via NET/Socket/0
node-01:1897709:1897727 [0] NCCL INFO New proxy send connection 3 from local rank 0, transport 2
node-01:1897709:1897731 [0] NCCL INFO Connected to proxy localRank 0 -> connection 0x...
node-01:1897709:1897731 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] [send] via NET/Socket/0
P2P - Shows peer-to-peer operations:
node-01:631398:631465 [0] NCCL INFO Allocated shareable buffer 0x... size 10485760 ipcDesc 0x...
node-01:631398:631398 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM
node-01:631398:631465 [0] NCCL INFO Allocated shareable buffer 0x... size 2097152 ipcDesc 0x...
node-01:631398:631398 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/CUMEM
TUNING - Shows algorithm and protocol selection:
node-01:631871:631871 [0] NCCL INFO AllReduce: 33554432 Bytes -> Algo RING proto SIMPLE channel{Lo..Hi}={0..31}
node-01:631871:631871 [0] NCCL INFO AllReduce: 33554432 Bytes -> Algo RING proto SIMPLE channel{Lo..Hi}={0..31}
node-01:631871:631871 [0] NCCL INFO AllReduce: 33554432 Bytes -> Algo RING proto SIMPLE channel{Lo..Hi}={0..31}
NVLS - Shows NVLink SHARP operations:
node-01:1538174:1538174 [0] NCCL INFO NVLS Creating Multicast group nranks 8 size 2097152 on rank 0
node-01:1538174:1538174 [0] NCCL INFO NVLS Created Multicast group 0x... nranks 8 size 2097152 on rank 0
node-01:1538174:1538174 [0] NCCL INFO NVLS rank 0 (dev 0) alloc done, ucptr 0x... ucgran 2097152 mcptr 0x... mcgran 2097152 ucsize 2097152 mcsize 2097152 (inputsize 24576)
Logging to Files
For multi-process jobs, logging to the terminal can be overwhelming. It is advisable to use NCCL_DEBUG_FILE to write logs to separate files per process and hostname, making it easier to isolate and analyze issues on specific ranks:
NCCL_DEBUG=INFO NCCL_DEBUG_FILE=/tmp/nccl_%h_%p.log ./my_app
Format specifiers:
%h- Replaced with the hostname%p- Replaced with the process ID (PID)
This creates files like /tmp/nccl_node-01_12345.log for each process.
Note: Ensure the filename pattern is unique across all processes to avoid file corruption.
Timestamp Configuration
NCCL log messages can include timestamps for timing analysis.
Timestamp Format
Use NCCL_DEBUG_TIMESTAMP_FORMAT to customize the timestamp format (uses strftime syntax):
# Default format: [YYYY-MM-DD HH:MM:SS]
NCCL_DEBUG=INFO NCCL_DEBUG_TIMESTAMP_LEVELS=INFO ./my_app
# Include milliseconds
NCCL_DEBUG=INFO NCCL_DEBUG_TIMESTAMP_LEVELS=INFO NCCL_DEBUG_TIMESTAMP_FORMAT="[%F %T.%3f] " ./my_app
# Disable timestamps
NCCL_DEBUG=INFO NCCL_DEBUG_TIMESTAMP_LEVELS=INFO NCCL_DEBUG_TIMESTAMP_FORMAT="" ./my_app
Timestamp Levels
Control which log levels include timestamps using NCCL_DEBUG_TIMESTAMP_LEVELS:
# Add timestamps to WARN, INFO, and TRACE
NCCL_DEBUG=INFO NCCL_DEBUG_TIMESTAMP_LEVELS=WARN,INFO,TRACE ./my_app
# Timestamps on everything except TRACE
NCCL_DEBUG=TRACE NCCL_DEBUG_TIMESTAMP_LEVELS=^TRACE ./my_app
By default, only WARN messages include timestamps.
Common Debugging Scenarios
Diagnosing Initialization Hangs
If your application hangs during NCCL initialization:
NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=INIT,BOOTSTRAP,NET ./my_app
Look for:
Bootstrap connection issues
Network interface selection problems
Rank synchronization failures
Investigating Network Issues
For network-related problems (timeouts, connection failures):
NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=NET ./my_app
Look for:
Which network interfaces and devices are selected
Connection establishment messages
Error messages from the network transport
Understanding Topology Detection
To see how NCCL detects and uses the system topology:
NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=GRAPH ./my_app
This shows:
Detected GPUs and their interconnects
NVLink and PCIe topology
Network device locality
Debugging Performance Issues
For performance analysis, enable tuning information:
NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=TUNING ./my_app
This shows:
Selected algorithms (Ring, Tree, etc.)
Protocol choices (Simple, LL, LL128)
Channel and thread configurations
Tracing NCCL API Calls
To see every NCCL function call:
NCCL_DEBUG=TRACE NCCL_DEBUG_SUBSYS=CALL ./my_app
Full Debugging Session
For comprehensive debugging, capture everything to a separate file per process:
NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=ALL NCCL_DEBUG_FILE=/tmp/nccl_%h_%p.log ./my_app