Creation and Lifecycle Methods

Methods on Communicator for creation, splitting, growing, and teardown.

Construction

classmethod Communicator.init(nranks: int, rank: int, unique_id: UniqueId | Sequence[UniqueId], config: NCCLConfig | None = None) Communicator

Initializes a new NCCL communicator.

Creates a communicator that connects multiple ranks. This is a collective operation: all ranks must call this method with the same nranks and unique_id but with different rank values.

Parameters:
  • nranks – Total number of ranks in the communicator.

  • rank – This rank (must be between 0 and nranks - 1).

  • unique_id – Unique identifier(s) shared by all ranks. A sequence may be passed to use ncclCommInitRankScalable().

  • config – NCCL configuration options. Defaults to None.

Returns:

A new Communicator instance.

Raises:

NcclInvalid – If unique_id has an invalid type.

classmethod Communicator.init_all(devices: int | Sequence[int] | None = None) list[Communicator]

Initializes multiple NCCL communicators for single-process multi-GPU operations.

Creates an array of NCCL communicators, one for each device, within a single process. This is optimized for single-machine scenarios where all GPUs are controlled by the same process. Unlike init(), which requires multi-process coordination (e.g. via MPI), init_all() handles all coordination internally.

Each communicator is bound to its corresponding device and has its rank equal to its index in the returned list. The current device context is preserved by the underlying NCCL API. All communicators must be manually destroyed via destroy() on each one.

Parameters:

devices – Specifies which devices to initialize. None (the default) initializes all visible CUDA devices. An int creates communicators for devices [0, 1, ..., devices - 1]. A sequence of ints uses the explicit device IDs. If the resulting device list is empty (devices=0, an empty sequence, or no visible devices), returns an empty list without calling into NCCL.

Returns:

List of initialized communicators, one per device. Rank i uses devices[i] (or device i when devices is an int).

Raises:

TypeError – If devices is not an int, sequence of ints, or None.

Communicator.initialize(nranks: int, rank: int, unique_id: UniqueId | Sequence[UniqueId], config: NCCLConfig | None = None) None

Initializes this communicator in-place.

Instance-method counterpart of the init() classmethod. Allows creating a null communicator first (via Communicator()) and initializing it later. This is a collective operation; all ranks must call this method.

Parameters:
  • nranks – Total number of ranks in the communicator.

  • rank – This rank (must be between 0 and nranks - 1).

  • unique_id – Unique identifier(s) shared by all ranks.

  • config – NCCL configuration options. Defaults to None.

Raises:

NcclInvalid – If unique_id has an invalid type or this communicator is already initialized.

Bootstrap identifier

A UniqueId is generated by one rank (typically rank 0) and broadcast to all participating ranks; all ranks then pass it to Communicator.init().

class nccl.core.UniqueId(_internal: _nccl_bindings.UniqueId | None = None)

Bases: object

NCCL unique identifier for communicator initialization.

A UniqueId is used to coordinate communicator initialization across multiple ranks. All ranks must use the same UniqueId to form a communicator. Typically one rank generates the UniqueId via get_unique_id() and broadcasts it to all other ranks. Three serialization paths are supported:

  • Bytes: bytes(uid) (or as_bytes) on the producer, from_bytes() on receivers. The bytes of unique ID can be transmitted through any byte-oriented channel — a TCP socket, a shared filesystem, etc.

  • NumPy: as_ndarray returns an in-place view of the underlying buffer, suitable for NumPy-aware buffer transports such as mpi4py.MPI.Comm.Bcast (uppercase B).

  • Pickle: instances are picklable directly, so higher level object broadcast helpers like mpi4py.MPI.Comm.bcast (lowercase b) work out of the box.

static from_bytes(b: bytes | bytearray | memoryview) UniqueId

Reconstructs a UniqueId from a bytes-like buffer.

Parameters:

b – Bytes representation of a UniqueId, typically obtained via the as_bytes property on the producing rank.

Returns:

Reconstructed UniqueId.

property as_ndarray: numpy.ndarray

NumPy array view of the unique ID data.

property as_bytes: bytes

Bytes representation of the unique ID, suitable for serialization or broadcast.

nccl.core.get_unique_id(empty: bool = False) UniqueId

Generates a new NCCL unique identifier for communicator initialization.

Should be called by one rank (typically rank 0); the resulting UniqueId must then be broadcast (e.g. via MPI) to all other ranks.

Parameters:

empty – If True, return an empty UniqueId without calling NCCL. Useful when the bytes will be filled in later via UniqueId.from_bytes(). Defaults to False.

Returns:

A new UniqueId to be shared across ranks.

Splitting and growing

Communicator.split(color: int | None = None, key: int = 0, config: NCCLConfig | None = None) Communicator

Splits this communicator into sub-communicators based on color values.

Ranks that pass the same color value will be part of the same group. If color is None, the rank will not be part of any group and receives a null communicator (a Communicator instance with ptr=0). The key value determines rank ordering; smaller key means smaller rank in the new communicator. If keys are equal, the rank in the original communicator determines ordering.

This is a collective operation: all ranks in the communicator must call this method, even ranks that pass color=None. There must be no outstanding NCCL operations on the communicator to avoid deadlock.

Parameters:
  • color – Non-negative color value for grouping ranks. Pass None to exclude this rank from all groups. Defaults to None.

  • key – Ordering key within the color group. Defaults to 0.

  • config – Configuration for the new communicator. If None, inherits the parent’s configuration. Defaults to None.

Returns:

New sub-communicator, or a null communicator if color is None.

Raises:

NcclInvalid – If the communicator is not initialized.

See also

ncclCommSplit()

Communicator.shrink(exclude_ranks: Sequence[int] | None = None, config: NCCLConfig | None = None, flag: CommShrinkFlag = CommShrinkFlag.DEFAULT) Communicator

Creates a new communicator by removing specified ranks from this one.

Ranks listed in exclude_ranks are excluded from the new communicator; the remaining ranks are renumbered to a contiguous [0, n) range.

This is a collective operation. All non-excluded ranks must call this method; excluded ranks must NOT call it. With DEFAULT there must be no outstanding NCCL operations to avoid deadlock; combine with config.shrink_share=True to reuse parent communicator resources. With ABORT outstanding operations are automatically aborted and no resources are shared with the parent.

Parameters:
  • exclude_ranks – Ranks to exclude from the new communicator. Defaults to None (no exclusions).

  • config – Configuration for the new communicator. If None, inherits the parent’s configuration. Defaults to None.

  • flag – Shrink behavior. Use DEFAULT for normal operation or ABORT after errors. Defaults to DEFAULT.

Returns:

New communicator without the excluded ranks.

Raises:

NcclInvalid – If the communicator is not initialized.

See also

ncclCommShrink()

Communicator.get_unique_id() UniqueId

Returns a per-communicator unique ID for use with grow().

Generates a unique identifier bound to this communicator that can be shared with new ranks joining via grow(). This is distinct from the global get_unique_id() used for initial communicator creation. Only one existing rank (the grow root) should call this method.

A new UID cannot be generated while a previous UID is unconsumed; each UID can be used only once and the user must wait for the corresponding grow operation to complete before calling again.

Returns:

UniqueId for grow operations.

Raises:

NcclInvalid – If the communicator is not initialized.

Communicator.grow(nranks: int, unique_id: UniqueId | None = None, rank: int | None = None, config: NCCLConfig | None = None) Communicator

Grows the communicator by adding new ranks.

Creates a new communicator that includes both existing ranks from this communicator and new ranks joining the group. There are three roles:

  • Existing root: the one existing rank that called get_unique_id().

  • Existing non-root: all other existing ranks.

  • New ranks: ranks joining via a null communicator (Communicator()).

This is a collective operation. All ranks (existing and new) must call this method. Usage by role:

  • Existing root: new_comm = existing_comm.grow(nranks, uid)

  • Existing non-root: new_comm = existing_comm.grow(nranks)

  • New rank: new_comm = Communicator().grow(nranks, uid, rank=assigned_rank)

The UID is consumed upon successful grow and cannot be reused.

Parameters:
  • nranks – Total number of ranks in the new communicator (existing plus new). All roles must pass the same value.

  • unique_id – Unique identifier from get_unique_id(). Existing root and new ranks must pass the UniqueId; existing non-root must pass None. Defaults to None.

  • rank – This rank’s ID in the new communicator. New ranks must pass their assigned rank, which must be >= the parent communicator size. Existing ranks must pass None. Defaults to None.

  • config – Configuration for the new communicator. Defaults to None.

Returns:

New Communicator containing all ranks.

Raises:

NcclInvalid – If a new rank is given an initialized communicator, or an existing rank is given a null communicator.

Teardown

Communicator.destroy() None

Destroys the communicator and frees local resources.

If finalize() has not been called explicitly, destroy() will call it internally. If finalize() is called explicitly, users must ensure the communicator state becomes ncclSuccess before calling destroy(). The communicator should not be accessed after destroy() returns.

All resources (registered buffers, windows, custom operators) owned by this communicator are automatically closed before destruction. This is an intra-node collective call: all ranks on the same node must call it to avoid hanging. The recommended pattern is finalize() followed by destroy().

Errors during cleanup are suppressed for safety.

Communicator.abort() None

Aborts the communicator and frees resources, terminating in-flight operations.

Should be called when an unrecoverable error occurs. Unlike destroy(), this immediately aborts uncompleted operations. All active ranks must call this function in order to abort the NCCL communicator successfully.

All resources (registered buffers, windows, custom operators) owned by this communicator are automatically closed before aborting. Errors during cleanup are suppressed for safety. For more details, see the Fault Tolerance section in the NCCL documentation.

See also

ncclCommAbort()

Communicator.finalize() None

Finalizes the communicator, flushing uncompleted operations and network resources.

Typically called before destroy() to ensure all operations complete. This is a collective operation that must be called by all ranks.

For nonblocking communicators this is itself nonblocking: success sets the communicator state to ncclInProgress to indicate finalization is in progress. Once all NCCL operations complete, the communicator transitions to ncclSuccess. Users can query the state with get_async_error().

Pause and resume

Communicator.revoke(flags: int = 0) None

Revokes the communicator.

Stops all in-flight operations and marks the communicator state as ncclInProgress. The state transitions to ncclSuccess when the communicator becomes quiescent, after which management operations (destroy(), split(), shrink()) can proceed safely.

Calling finalize() after revoke() is invalid. Resource sharing via split-share / shrink-share is disabled while revoked.

Parameters:

flags – Reserved for future use. Currently must be 0.

Raises:

NcclInvalid – If the communicator is not initialized.

Communicator.suspend(flags: CommSuspendFlag = CommSuspendFlag.MEM) None

Suspends communicator operations to free resources.

The communicator cannot be used for communication while suspended. Call resume() to restore it.

Parameters:

flags – Suspend flags controlling what resources to release. MEM releases dynamic GPU memory allocations.

Raises:

NcclInvalid – If the communicator is not initialized.

Communicator.resume() None

Resumes all previously suspended communicator resources.

Restores a communicator that was suspended with suspend() so that it can be used for communication again.

Raises:

NcclInvalid – If the communicator is not initialized.

Flag enums

CommShrinkFlag

class nccl.core.CommShrinkFlag(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: IntEnum

Behavior flag for Communicator.shrink().

DEFAULT = 0

Shrink the parent communicator normally; outstanding NCCL operations must already be quiesced.

ABORT = 1

First terminate ongoing parent operations, then shrink. No resources are shared with the parent.

CommSuspendFlag

class nccl.core.CommSuspendFlag(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: IntFlag

Behavior flag for Communicator.suspend().

MEM = 1

Suspend memory by releasing dynamic GPU memory allocations held by the communicator.