nemo_automodel.components.distributed.ddp
nemo_automodel.components.distributed.ddp
Module Contents
Classes
Data
API
Manager for distributed training using PyTorch’s DDP.
This manager wraps models with DistributedDataParallel for data-parallel distributed training.
Parameters:
config
Configuration for DDP distributed training.
activation_checkpointing
broadcast_buffers
bucket_cap_mb
find_unused_parameters
gradient_as_bucket_view
static_graph
Initialize device configuration for DDP.
Sets the rank, world_size, and device based on the process group backend.
Wraps the given model with DistributedDataParallel (DDP).
Moves the model to the initialized device before wrapping. For CUDA devices, the device id is passed to DDP as device_ids; for CPU, no device ids are provided.
Parameters:
model
The PyTorch model to be wrapped.
Returns:
torch.nn.parallel.DistributedDataParallel: The DDP-wrapped model.