bridge.models.megatron_mimo.megatron_mimo_ddp#

DDP wrapping utilities for MegatronMIMO models.

Called from the training layer after MegatronMIMOProvider.provide().

Note: This module only supports DDP wrapping. FSDP is not yet implemented.

Module Contents#

Functions#

wrap_megatron_mimo_model_distributed

Wrap MegatronMIMO model’s submodules with DDP.

API#

bridge.models.megatron_mimo.megatron_mimo_ddp.wrap_megatron_mimo_model_distributed(
megatron_mimo_model: megatron.core.models.mimo.MimoModel,
ddp_config: megatron.core.distributed.DistributedDataParallelConfig,
megatron_mimo_parallelism_config: megatron.bridge.models.megatron_mimo.megatron_mimo_config.MegatronMIMOParallelismConfig,
grids: Dict[str, megatron.core.hyper_comm_grid.HyperCommGrid],
pg_collections: Dict[str, Optional[megatron.core.process_groups_config.ProcessGroupCollection]],
) megatron.core.models.mimo.MimoModel#

Wrap MegatronMIMO model’s submodules with DDP.

Modifies megatron_mimo_model in-place and returns it.

Parameters:
  • megatron_mimo_model – The MimoModel to wrap.

  • ddp_config – DDP configuration from Bridge.

  • megatron_mimo_parallelism_config – MegatronMIMO parallelism configuration.

  • grids – Module name to HyperCommGrid mapping.

  • pg_collections – Module name to ProcessGroupCollection mapping.

Returns:

The same megatron_mimo_model with wrapped submodules.