nemo_rl.models.automodel.setup#
Setup utilities for automodel-based training in NeMo RL.
Module Contents#
Functions#
Validate configuration and prepare runtime settings. |
|
Set up reference model state dict by creating a CPU copy of the model’s state dict. |
|
Set up distributed training environment and create FSDP2Manager. |
|
Set up model, parallelization, and optimizer. |
Data#
API#
- nemo_rl.models.automodel.setup.STRING_TO_DTYPE#
None
- nemo_rl.models.automodel.setup.validate_and_prepare_config(
- config: nemo_rl.models.policy.PolicyConfig,
- processor: Optional[transformers.AutoProcessor],
- rank: int,
Validate configuration and prepare runtime settings.
This function validates the policy configuration, sets environment variables, determines model configuration, and returns runtime settings as a named tuple.
- Parameters:
config – Policy configuration dictionary
processor – Optional processor for multimodal models
rank – Current process rank
- Returns:
RuntimeConfig named tuple containing validated configuration values
- Raises:
ValueError – If configuration is invalid
RuntimeError – If incompatible settings are detected
- nemo_rl.models.automodel.setup.setup_reference_model_state(
- model: torch.nn.Module,
Set up reference model state dict by creating a CPU copy of the model’s state dict.
This creates a reference copy of the model weights on CPU with pinned memory for efficient CPU-GPU transfers. The reference model is typically used to compute reference log probabilities during RL training.
- Parameters:
model – The model to create a reference copy from
- Returns:
Dictionary mapping parameter names to CPU tensors with pinned memory
.. rubric:: Example
model = setup_model(…) reference_model_state_dict = setup_reference_model_state(model)
- nemo_rl.models.automodel.setup.setup_distributed(
- config: nemo_rl.models.policy.PolicyConfig,
- runtime_config: nemo_rl.models.automodel.config.RuntimeConfig,
Set up distributed training environment and create FSDP2Manager.
Initializes torch.distributed process group and creates an FSDP2Manager with the appropriate parallelization and precision settings.
- Parameters:
config – Policy configuration dictionary
runtime_config – RuntimeConfig named tuple from validate_and_prepare_config
- Returns:
FSDP2Manager instance with all distributed configuration
.. note::
The returned FSDP2Manager contains all distributed attributes:
dp_size, tp_size, cp_size, ep_size: parallelization sizes
dp_mesh, tp_mesh, cp_mesh, device_mesh: device meshes
moe_mesh: MoE mesh if expert parallelism is used
dp_replicate_size, dp_shard_size, ep_shard_size: sharding sizes
- nemo_rl.models.automodel.setup.setup_model_and_optimizer(
- config: nemo_rl.models.policy.PolicyConfig,
- tokenizer: transformers.AutoTokenizer,
- runtime_config: nemo_rl.models.automodel.config.RuntimeConfig,
- distributed_manager: nemo_automodel.components.distributed.fsdp2.FSDP2Manager,
- checkpoint_manager: Any,
- is_vlm: bool = False,
- init_optimizer: bool = True,
- weights_path: Optional[str] = None,
- optimizer_path: Optional[str] = None,
Set up model, parallelization, and optimizer.
Creates the model from config, applies parallelization strategies (FSDP2, TP, CP), loads base weights, and optionally initializes optimizer and scheduler.
- Parameters:
config – Policy configuration dictionary
tokenizer – Tokenizer for the model
runtime_config – RuntimeConfig named tuple from validate_and_prepare_config
distributed_manager – FSDP2Manager from setup_distributed
checkpoint_manager – Checkpoint manager for loading/saving weights
is_vlm – Whether this is a vision-language model
init_optimizer – Whether to initialize optimizer
weights_path – Optional path to checkpoint weights to load
optimizer_path – Optional path to optimizer state to load
- Returns:
ModelAndOptimizerState containing model, optimizer, scheduler, and metadata
.. note::
The function handles special cases for:
MoE models (uses custom parallelization)
LoRA (applies adapter layers)
Context parallel validation
Tied word embeddings