`nemo_automodel.components.distributed.fsdp2`#

Module Contents#

Classes#

FSDP2Manager

Manager for parallelizing models using FSDP2 with TP, DP, CP sharding.

Data#

logger

API#

nemo_automodel.components.distributed.fsdp2.logger#: ‘getLogger(…)’

class nemo_automodel.components.distributed.fsdp2.FSDP2Manager( config: nemo_automodel.components.distributed.config.FSDP2Config, device_mesh: torch.distributed.device_mesh.DeviceMesh, moe_mesh: Optional[torch.distributed.device_mesh.DeviceMesh] = None, )#

Manager for parallelizing models using FSDP2 with TP, DP, CP sharding.

This manager applies parallelization to the model using a prescribed TP sharding plan. It supports mixed precision and CPU offloading options.

The device mesh must be created externally and passed in.

Parameters:

config (FSDP2Config) – Configuration for FSDP2 distributed training.
device_mesh (DeviceMesh) – Device mesh for distributed operations.
moe_mesh (Optional[DeviceMesh]) – Optional device mesh for expert parallelism.

.. rubric:: Example

from nemo_automodel.components.distributed.config import FSDP2Config

config = FSDP2Config(sequence_parallel=True, activation_checkpointing=True)

device_mesh created externally via create_device_mesh()#

manager = FSDP2Manager(config, device_mesh=device_mesh, moe_mesh=moe_mesh) model = manager.parallelize(model)

Initialization

parallelize(model)#

Parallelizes the given model using FSDP2 and TP sharding strategies.

Parameters:: model (nn.Module) – The model to be parallelized.
Returns:: The parallelized model.

nemo_automodel.components.distributed.fsdp2#