`nemo_rl.models.automodel.setup`#

Setup utilities for automodel-based training in NeMo RL.

Module Contents#

Functions#

`validate_and_prepare_config`	Validate configuration and prepare runtime settings.
`setup_reference_model_state`	Set up reference model state dict by creating a CPU copy of the model’s state dict.
`setup_distributed`	Set up distributed training environment and create FSDP2Manager.
`setup_model_and_optimizer`	Set up model, parallelization, and optimizer.

Data#

STRING_TO_DTYPE

API#

nemo_rl.models.automodel.setup.STRING_TO_DTYPE#: None

nemo_rl.models.automodel.setup.validate_and_prepare_config( config: nemo_rl.models.policy.PolicyConfig, processor: Optional[transformers.AutoProcessor], rank: int, ) → nemo_rl.models.automodel.config.RuntimeConfig#

Validate configuration and prepare runtime settings.

This function validates the policy configuration, sets environment variables, determines model configuration, and returns runtime settings as a named tuple.

Parameters:

config – Policy configuration dictionary
processor – Optional processor for multimodal models
rank – Current process rank

Returns:

RuntimeConfig named tuple containing validated configuration values

Raises:

ValueError – If configuration is invalid
RuntimeError – If incompatible settings are detected

nemo_rl.models.automodel.setup.setup_reference_model_state( model: torch.nn.Module, ) → dict[str, torch.Tensor]#

Set up reference model state dict by creating a CPU copy of the model’s state dict.

This creates a reference copy of the model weights on CPU with pinned memory for efficient CPU-GPU transfers. The reference model is typically used to compute reference log probabilities during RL training.

Parameters:: model – The model to create a reference copy from
Returns:: Dictionary mapping parameter names to CPU tensors with pinned memory

.. rubric:: Example

model = setup_model(…) reference_model_state_dict = setup_reference_model_state(model)

nemo_rl.models.automodel.setup.setup_distributed( config: nemo_rl.models.policy.PolicyConfig, runtime_config: nemo_rl.models.automodel.config.RuntimeConfig, ) → nemo_automodel.components.distributed.fsdp2.FSDP2Manager#

Set up distributed training environment and create FSDP2Manager.

Initializes torch.distributed process group and creates an FSDP2Manager with the appropriate parallelization and precision settings.

Parameters:

config – Policy configuration dictionary
runtime_config – RuntimeConfig named tuple from validate_and_prepare_config

Returns:

FSDP2Manager instance with all distributed configuration

.. note::

The returned FSDP2Manager contains all distributed attributes:

dp_size, tp_size, cp_size, ep_size: parallelization sizes
dp_mesh, tp_mesh, cp_mesh, device_mesh: device meshes
moe_mesh: MoE mesh if expert parallelism is used
dp_replicate_size, dp_shard_size, ep_shard_size: sharding sizes

nemo_rl.models.automodel.setup.setup_model_and_optimizer( config: nemo_rl.models.policy.PolicyConfig, tokenizer: transformers.AutoTokenizer, runtime_config: nemo_rl.models.automodel.config.RuntimeConfig, distributed_manager: nemo_automodel.components.distributed.fsdp2.FSDP2Manager, checkpoint_manager: Any, is_vlm: bool = False, init_optimizer: bool = True, weights_path: Optional[str] = None, optimizer_path: Optional[str] = None, ) → nemo_rl.models.automodel.config.ModelAndOptimizerState#

Set up model, parallelization, and optimizer.

Creates the model from config, applies parallelization strategies (FSDP2, TP, CP), loads base weights, and optionally initializes optimizer and scheduler.

Parameters:

config – Policy configuration dictionary
tokenizer – Tokenizer for the model
runtime_config – RuntimeConfig named tuple from validate_and_prepare_config
distributed_manager – FSDP2Manager from setup_distributed
checkpoint_manager – Checkpoint manager for loading/saving weights
is_vlm – Whether this is a vision-language model
init_optimizer – Whether to initialize optimizer
weights_path – Optional path to checkpoint weights to load
optimizer_path – Optional path to optimizer state to load

Returns:

ModelAndOptimizerState containing model, optimizer, scheduler, and metadata

.. note::

The function handles special cases for:

MoE models (uses custom parallelization)
LoRA (applies adapter layers)
Context parallel validation
Tied word embeddings

nemo_rl.models.automodel.setup#

Module Contents#

Functions#

Data#

API#

`nemo_rl.models.automodel.setup`#