core.nccl_allocator#

Module Contents#

Classes#

nccl_mem

An NCCL memory allocator, which inherits APEX nccl_allocator implementation.

MultiGroupMemPoolAllocator

A custom allocator class that registers a single memory pool with multiple communication groups.

Functions#

_build_nccl_allocator

get_func_args

Get the argument names of a function.

create_nccl_mem_pool

Create a memory pool using the NCCL allocator.

init

Initialize the NCCL allocator.

Data#

API#

core.nccl_allocator._allocator#

None

core.nccl_allocator._build_nccl_allocator()#
core.nccl_allocator.get_func_args(func)#

Get the argument names of a function.

core.nccl_allocator.create_nccl_mem_pool(symmetric=None)#

Create a memory pool using the NCCL allocator.

core.nccl_allocator.init() None#

Initialize the NCCL allocator.

PyTorch tracks memory registration at the pool level, not per allocation. If a pool already contains allocations from a previous context, attempting to register it again will re-register all existing allocations and may trigger NCCL errors. To avoid this, the pool is explicitly deregistered on entry and re-registered on exit for each context use.

class core.nccl_allocator.nccl_mem(pool, enabled=True, device=None, group=None, symmetric=True)#

An NCCL memory allocator, which inherits APEX nccl_allocator implementation.

Initialization

__enter__()#
__exit__(*args)#
class core.nccl_allocator.MultiGroupMemPoolAllocator(pool, groups, symmetric=True)#

A custom allocator class that registers a single memory pool with multiple communication groups.

Use cases:

  • [FSDP+EP] In case of FSDP with EP, expert layer (expert-dp) and non-expert layer (dp) use different communicator groups. The same memory pool has to be registered to both the groups.

  • [Hybrid FSDP/DP] In case of Hybrid FSDP/DP, there are inter-dp group and intra-dp group. The same memory pool has to be registered to both the groups.

  • [Hybrid FSDP/DP + EP] In case of Hybrid FSDP/DP + EP, there are inter-dp, intra-dp, and expert-dp groups. The same memory pool has to be registered to all the groups.

.. rubric:: Example

import megatron.core.nccl_allocator as nccl_allocator
nccl_allocator.init()
pool = nccl_allocator.create_nccl_mem_pool()
group_1 = torch.distributed.new_group(ranks=[0, 1, 2, 3, 4, 5, 6, 7], backend="nccl")
group_2 = torch.distributed.new_group(ranks=[0, 2, 4, 6], backend="nccl")
with MultiGroupMemPoolAllocator(pool, [group_1, group_2]):
    a = torch.zeros(1024, dtype=torch.float32, device="cuda")
    b = torch.zeros(1024, dtype=torch.float32, device="cuda")

Initialization

__enter__()#
__exit__(*args)#