Status and Utility Methods

Methods on Communicator for resource cleanup and error/status queries.

close_all_resources

Communicator.close_all_resources() None

Closes all resources owned by this communicator.

Called automatically during destroy() and abort(), but can be called manually. Performs best-effort cleanup, ignoring any errors that occur during resource deallocation. Idempotent: safe to call multiple times.

get_last_error

Communicator.get_last_error() str

Returns the last error string for this communicator.

Raises:

NcclInvalid – If the communicator is not initialized.

get_async_error

Communicator.get_async_error() nccl.bindings.nccl.Result

Queries the progress and potential errors of asynchronous NCCL operations.

Operations without a stream argument (e.g. finalize()) are complete when they return ncclSuccess. Operations with a stream argument (e.g. reduce()) return ncclSuccess when posted but may report errors through this method until completed. If any NCCL function returns ncclInProgress, users must query the communicator state until it becomes ncclSuccess before calling another NCCL function.

Before the state becomes ncclSuccess, do not issue CUDA kernels on streams used by NCCL. If an error occurs, destroy the communicator with abort(); nothing can be assumed about the completion or correctness of enqueued operations after an error.

Returns:

Current state of the communicator (ncclSuccess, ncclInProgress, or an error code).

Raises:

NcclInvalid – If the communicator is not initialized.

get_mem_stat

Communicator.get_mem_stat(stat: NcclCommMemStat) int

Queries communicator memory statistics.

Parameters:

stat – The memory statistic to query.

Returns:

The memory statistic value (bytes, or 0/1 for GPU_MEM_SUSPENDED).

Raises:

NcclInvalid – If the communicator is not initialized.

NcclCommMemStat

class nccl.core.NcclCommMemStat(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: IntEnum

Memory-statistic selector, mirroring ncclCommMemStat_t.

Used as the stat argument of Communicator.get_mem_stat() to identify which memory statistic to query. All values are returned in bytes except GPU_MEM_SUSPENDED, which is a 0/1 flag.

GPU_MEM_SUSPEND = 0

Communicator-allocated GPU memory that can be released by Communicator.suspend() (bytes).

GPU_MEM_SUSPENDED = 1

Whether communicator-allocated GPU memory is currently suspended (0 = active, 1 = suspended).

GPU_MEM_PERSIST = 2

Communicator-allocated GPU memory that cannot be suspended (bytes).

GPU_MEM_TOTAL = 3

Total communicator-allocated GPU memory tracked by NCCL (bytes).

get_error_string

Module-level helper to render an NCCL result code as a human-readable string.

nccl.core.get_error_string(nccl_result: _nccl_bindings.Result | int) str

Returns a human-readable error string for an NCCL result code.

Parameters:

nccl_result – NCCL result code.

Returns:

Human-readable error message corresponding to the result code.