`core.nccl_allocator`#

Module Contents#

Classes#

`nccl_mem`	An NCCL memory allocator, which inherits APEX nccl_allocator implementation.
`MultiGroupMemPoolAllocator`	A custom allocator class that registers a single memory pool with multiple communication groups.
`MemPoolAllocatorWithoutRegistration`	An allocator class that uses allocates memory without registering to any communication group. Users are expected to register the memory manually to the communication groups.

Functions#

`_build_nccl_allocator`
`get_func_args`	Get the argument names of a function.
`create_nccl_mem_pool`	Create a memory pool using the NCCL allocator.
`init`	Initialize the NCCL allocator.
`register_mem_pool`	Register a memory pool to a group. symmetric: bool, this is for future use.
`deregister_mem_pool`	Deregister a memory pool from a group.

Data#

`logger`
`_allocator`

API#

core.nccl_allocator.logger#: ‘getLogger(…)’

core.nccl_allocator._allocator#: None

core.nccl_allocator._build_nccl_allocator()#

core.nccl_allocator.get_func_args(func)#: Get the argument names of a function.

core.nccl_allocator.create_nccl_mem_pool(symmetric=None)#: Create a memory pool using the NCCL allocator.

core.nccl_allocator.init() → None#

Initialize the NCCL allocator.

PyTorch tracks memory registration at the pool level, not per allocation. If a pool already contains allocations from a previous context, attempting to register it again will re-register all existing allocations and may trigger NCCL errors. To avoid this, the pool is explicitly deregistered on entry and re-registered on exit for each context use.

core.nccl_allocator.register_mem_pool(pool, group, symmetric=True)#: Register a memory pool to a group. symmetric: bool, this is for future use.

core.nccl_allocator.deregister_mem_pool(pool, group)#: Deregister a memory pool from a group.

class core.nccl_allocator.nccl_mem(pool, enabled=True, device=None, group=None, symmetric=True)#

An NCCL memory allocator, which inherits APEX nccl_allocator implementation.

Initialization

__enter__()#

__exit__(*args)#

class core.nccl_allocator.MultiGroupMemPoolAllocator(pool, groups, symmetric=True)#

A custom allocator class that registers a single memory pool with multiple communication groups.

Use cases:

[FSDP+EP] In case of FSDP with EP, expert layer (expert-dp) and non-expert layer (dp) use different communicator groups. The same memory pool has to be registered to both the groups.
[Hybrid FSDP/DP] In case of Hybrid FSDP/DP, there are inter-dp group and intra-dp group. The same memory pool has to be registered to both the groups.
[Hybrid FSDP/DP + EP] In case of Hybrid FSDP/DP + EP, there are inter-dp, intra-dp, and expert-dp groups. The same memory pool has to be registered to all the groups.

.. rubric:: Example

import megatron.core.nccl_allocator as nccl_allocator
nccl_allocator.init()
pool = nccl_allocator.create_nccl_mem_pool()
group_1 = torch.distributed.new_group(ranks=[0, 1, 2, 3, 4, 5, 6, 7], backend="nccl")
group_2 = torch.distributed.new_group(ranks=[0, 2, 4, 6], backend="nccl")
with MultiGroupMemPoolAllocator(pool, [group_1, group_2]):
    a = torch.zeros(1024, dtype=torch.float32, device="cuda")
    b = torch.zeros(1024, dtype=torch.float32, device="cuda")

Initialization

__enter__()#

__exit__(*args)#

class core.nccl_allocator.MemPoolAllocatorWithoutRegistration(pool)#

An allocator class that uses allocates memory without registering to any communication group. Users are expected to register the memory manually to the communication groups.

Initialization

__enter__()#

__exit__(*args)#

core.nccl_allocator#

Module Contents#

Classes#

Functions#

Data#

API#

`core.nccl_allocator`#