`core.distributed.fsdp.mcore_fsdp_adapter`#

Module Contents#

Classes#

FullyShardedDataParallel

Fully Sharded Data Parallel (FSDP) wrapper for the Megatron model.

Functions#

`_get_hsdp_tp_mesh`
`_get_dp_tp_mesh`
`_check_mesh_ranks_and_group_ranks_are_consistent`
`_get_rng_state_dict`
`_load_rng_state_dict`

Data#

logger

API#

core.distributed.fsdp.mcore_fsdp_adapter.logger#: ‘getLogger(…)’

class core.distributed.fsdp.mcore_fsdp_adapter.FullyShardedDataParallel( config: megatron.core.transformer.transformer_config.TransformerConfig, ddp_config: megatron.core.distributed.distributed_data_parallel_config.DistributedDataParallelConfig, module: torch.nn.Module, fsdp_unit_modules: Optional[List[torch.nn.Module]] = None, disable_bucketing: bool = False, device: Optional[torch.device] = None, pg_collection: Optional[megatron.core.process_groups_config.ProcessGroupCollection] = None, )#

Bases: megatron.core.distributed.data_parallel_base._BaseDataParallel