nemo_rl.models.automodel.setup#

Setup utilities for automodel-based training in NeMo RL.

Module Contents#

Functions#

validate_and_prepare_config

Validate configuration and prepare runtime settings.

setup_reference_model_state

Set up reference model state dict by creating a CPU copy of the model’s state dict.

setup_distributed

Set up distributed training environment and create FSDP2Manager.

setup_model_and_optimizer

Set up model, parallelization, and optimizer.

Data#

API#

nemo_rl.models.automodel.setup.STRING_TO_DTYPE#

None

nemo_rl.models.automodel.setup.validate_and_prepare_config(
config: nemo_rl.models.policy.PolicyConfig,
processor: Optional[transformers.AutoProcessor],
rank: int,
) nemo_rl.models.automodel.config.RuntimeConfig#

Validate configuration and prepare runtime settings.

This function validates the policy configuration, sets environment variables, determines model configuration, and returns runtime settings as a named tuple.

Parameters:
  • config – Policy configuration dictionary

  • processor – Optional processor for multimodal models

  • rank – Current process rank

Returns:

RuntimeConfig named tuple containing validated configuration values

Raises:
  • ValueError – If configuration is invalid

  • RuntimeError – If incompatible settings are detected

nemo_rl.models.automodel.setup.setup_reference_model_state(
model: torch.nn.Module,
) dict[str, torch.Tensor]#

Set up reference model state dict by creating a CPU copy of the model’s state dict.

This creates a reference copy of the model weights on CPU with pinned memory for efficient CPU-GPU transfers. The reference model is typically used to compute reference log probabilities during RL training.

Parameters:

model – The model to create a reference copy from

Returns:

Dictionary mapping parameter names to CPU tensors with pinned memory

.. rubric:: Example

model = setup_model(…) reference_model_state_dict = setup_reference_model_state(model)

nemo_rl.models.automodel.setup.setup_distributed(
config: nemo_rl.models.policy.PolicyConfig,
runtime_config: nemo_rl.models.automodel.config.RuntimeConfig,
) nemo_automodel.components.distributed.fsdp2.FSDP2Manager#

Set up distributed training environment and create FSDP2Manager.

Initializes torch.distributed process group and creates an FSDP2Manager with the appropriate parallelization and precision settings.

Parameters:
  • config – Policy configuration dictionary

  • runtime_config – RuntimeConfig named tuple from validate_and_prepare_config

Returns:

FSDP2Manager instance with all distributed configuration

.. note::

The returned FSDP2Manager contains all distributed attributes:

  • dp_size, tp_size, cp_size, ep_size: parallelization sizes

  • dp_mesh, tp_mesh, cp_mesh, device_mesh: device meshes

  • moe_mesh: MoE mesh if expert parallelism is used

  • dp_replicate_size, dp_shard_size, ep_shard_size: sharding sizes

nemo_rl.models.automodel.setup.setup_model_and_optimizer(
config: nemo_rl.models.policy.PolicyConfig,
tokenizer: transformers.AutoTokenizer,
runtime_config: nemo_rl.models.automodel.config.RuntimeConfig,
distributed_manager: nemo_automodel.components.distributed.fsdp2.FSDP2Manager,
checkpoint_manager: Any,
is_vlm: bool = False,
init_optimizer: bool = True,
weights_path: Optional[str] = None,
optimizer_path: Optional[str] = None,
) nemo_rl.models.automodel.config.ModelAndOptimizerState#

Set up model, parallelization, and optimizer.

Creates the model from config, applies parallelization strategies (FSDP2, TP, CP), loads base weights, and optionally initializes optimizer and scheduler.

Parameters:
  • config – Policy configuration dictionary

  • tokenizer – Tokenizer for the model

  • runtime_config – RuntimeConfig named tuple from validate_and_prepare_config

  • distributed_manager – FSDP2Manager from setup_distributed

  • checkpoint_manager – Checkpoint manager for loading/saving weights

  • is_vlm – Whether this is a vision-language model

  • init_optimizer – Whether to initialize optimizer

  • weights_path – Optional path to checkpoint weights to load

  • optimizer_path – Optional path to optimizer state to load

Returns:

ModelAndOptimizerState containing model, optimizer, scheduler, and metadata

.. note::

The function handles special cases for:

  • MoE models (uses custom parallelization)

  • LoRA (applies adapter layers)

  • Context parallel validation

  • Tied word embeddings