core.nccl_allocator#

Module Contents#

Classes#

nccl_mem

An NCCL memory allocator, which inherits APEX nccl_allocator implementation.

MultiGroupMemPoolAllocator

A custom allocator class that registers a single memory pool with multiple communication groups.

MemPoolAllocatorWithoutRegistration

An allocator class that uses allocates memory without registering to any communication group. Users are expected to register the memory manually to the communication groups.

Functions#

_build_nccl_allocator

get_func_args

Get the argument names of a function.

create_nccl_mem_pool

Create a memory pool using the NCCL allocator.

init

Initialize the NCCL allocator.

register_mem_pool

Register a memory pool to a group. symmetric: bool, this is for future use.

deregister_mem_pool

Deregister a memory pool from a group.

Data#

API#

core.nccl_allocator._allocator#

None

core.nccl_allocator._build_nccl_allocator()#
core.nccl_allocator.get_func_args(func)#

Get the argument names of a function.

core.nccl_allocator.create_nccl_mem_pool(symmetric=None)#

Create a memory pool using the NCCL allocator.

core.nccl_allocator.init() None#

Initialize the NCCL allocator.

PyTorch tracks memory registration at the pool level, not per allocation. If a pool already contains allocations from a previous context, attempting to register it again will re-register all existing allocations and may trigger NCCL errors. To avoid this, the pool is explicitly deregistered on entry and re-registered on exit for each context use.

core.nccl_allocator.register_mem_pool(pool, group, symmetric=True)#

Register a memory pool to a group. symmetric: bool, this is for future use.

core.nccl_allocator.deregister_mem_pool(pool, group)#

Deregister a memory pool from a group.

class core.nccl_allocator.nccl_mem(pool, enabled=True, device=None, group=None, symmetric=True)#

An NCCL memory allocator, which inherits APEX nccl_allocator implementation.

Initialization

__enter__()#
__exit__(*args)#
class core.nccl_allocator.MultiGroupMemPoolAllocator(pool, groups, symmetric=True)#

A custom allocator class that registers a single memory pool with multiple communication groups.

Use cases:

  • [FSDP+EP] In case of FSDP with EP, expert layer (expert-dp) and non-expert layer (dp) use different communicator groups. The same memory pool has to be registered to both the groups.

  • [Hybrid FSDP/DP] In case of Hybrid FSDP/DP, there are inter-dp group and intra-dp group. The same memory pool has to be registered to both the groups.

  • [Hybrid FSDP/DP + EP] In case of Hybrid FSDP/DP + EP, there are inter-dp, intra-dp, and expert-dp groups. The same memory pool has to be registered to all the groups.

.. rubric:: Example

import megatron.core.nccl_allocator as nccl_allocator
nccl_allocator.init()
pool = nccl_allocator.create_nccl_mem_pool()
group_1 = torch.distributed.new_group(ranks=[0, 1, 2, 3, 4, 5, 6, 7], backend="nccl")
group_2 = torch.distributed.new_group(ranks=[0, 2, 4, 6], backend="nccl")
with MultiGroupMemPoolAllocator(pool, [group_1, group_2]):
    a = torch.zeros(1024, dtype=torch.float32, device="cuda")
    b = torch.zeros(1024, dtype=torch.float32, device="cuda")

Initialization

__enter__()#
__exit__(*args)#
class core.nccl_allocator.MemPoolAllocatorWithoutRegistration(pool)#

An allocator class that uses allocates memory without registering to any communication group. Users are expected to register the memory manually to the communication groups.

Initialization

__enter__()#
__exit__(*args)#