nemo_automodel.components.distributed.megatron_fsdp
nemo_automodel.components.distributed.megatron_fsdp
Module Contents
Classes
Functions
Data
API
Manager for parallelizing models using MegatronFSDP with TP, DP, CP sharding.
This manager applies parallelization to the model using a prescribed TP sharding plan. It supports mixed precision and various FSDP options.
The device mesh must be created externally and passed in.
Parameters:
Configuration for MegatronFSDP distributed training.
Device mesh for distributed operations.
Parallelizes the given model using MegatronFSDP and TP sharding strategies.
Parameters:
The model to be parallelized.
The optimizer for the model. If None, user needs to call model.finish_grad_sync() before optimizer.step(), model.install_optimized_model_weights() and model.zero_grad_buffer() after optimizer.zero_grad().
Returns:
(parallelized_model, optimizer)
Shard the optimizer with Megatron-FSDP when the strategy requires it.
Returns the optimizer unchanged unless distributed_config is a
:class:MegatronFSDPConfig running in a distributed (world size > 1) job.
Parameters:
The (already sharded) model part the optimizer belongs to.
The optimizer to (optionally) shard.
Distributed strategy config; only triggers sharding
when it is a :class:MegatronFSDPConfig.
Guard for optimizers incompatible with Megatron-FSDP sharding (e.g. Dion); asserts when sharding would otherwise apply.