nemo_automodel.components.moe.uccl_ep.buffer#

UCCLBuffer: a DeepEP-compatible Buffer backed by UCCL-EP.

This module re-exports the canonical Buffer implementation under the UCCLBuffer alias expected by nemo_automodel, with automatic intranode detection.

Module Contents#

Classes#

UCCLBuffer

Buffer subclass that auto-detects intranode mode.

Data#

API#

class nemo_automodel.components.moe.uccl_ep.buffer.UCCLBuffer(
group,
num_nvl_bytes: int = 0,
num_rdma_bytes: int = 0,
low_latency_mode: bool = False,
num_qps_per_rank: int = 24,
allow_nvlink_for_low_latency_mode: bool = True,
allow_mnnvl: bool = False,
explicitly_destroy: bool = False,
is_intranode: bool = False,
)#

Bases: nemo_automodel.components.moe.uccl_ep._buffer.Buffer

Buffer subclass that auto-detects intranode mode.

When all EP ranks fit on a single node (group_size <= LOCAL_WORLD_SIZE), RDMA is disabled and only NVLink is used, avoiding RDMA MR registration failures on single-node setups.

Initialization

Initialize the communication buffer.

Parameters:
  • group – the communication group.

  • num_nvl_bytes – the buffer size for intranode NVLink communication.

  • num_rdma_bytes – the buffer size for internode (also for intranode with low-latency mode) RDMA communication.

  • low_latency_mode – whether to enable low-latency mode.

  • num_qps_per_rank – the number of QPs for RDMA, the low-latency mode requires that this number equals to the number of local experts.

  • allow_nvlink_for_low_latency_mode – whether allow NVLink traffic for low-latency mode, you should notice this is somehow incompatible with the hook-based overlapping. Warning: PCIe connections may lead to errors due to memory ordering issues, please make sure all connections are via NVLink.

  • allow_mnnvl – whether to allow MNNVL

  • explicitly_destroy – If this flag is set to True, you need to explicitly call destroy() to release resources; otherwise, the resources will be released by the destructor. Note: Releasing resources in the destructor may cause Python’s exception handling process to hang.

nemo_automodel.components.moe.uccl_ep.buffer.__all__#

[‘UCCLBuffer’, ‘Buffer’, ‘EventOverlap’, ‘EventHandle’]