`nemo_rl.models.automodel.setup`#

Setup utilities for automodel-based training in NeMo RL.

Module Contents#

Functions#

`_maybe_set_force_hf`	Validate and maybe auto-set force_hf based on adapter compatibility.
`get_tokenizer`	Get tokenizer using NeMoAutoTokenizer for automodel workers.
`validate_and_prepare_config`	Validate configuration and prepare runtime settings.
`setup_reference_model_state`	Set up reference model state dict by creating a CPU copy of the model’s state dict.
`setup_distributed`	Set up distributed training environment and create device meshes.
`_disable_automodel_checkpoint_dtype_restore`	No-op Automodel’s `_restore_loaded_model_dtype` on the HF/force_hf load path.
`setup_model_and_optimizer`	Set up model, parallelization, and optimizer.

Data#

STRING_TO_DTYPE

API#

nemo_rl.models.automodel.setup.STRING_TO_DTYPE#: None

nemo_rl.models.automodel.setup._maybe_set_force_hf(automodel_kwargs: dict, model_config) → None#

Validate and maybe auto-set force_hf based on adapter compatibility.

Custom model implementations (e.g. Qwen2, Llama) use state_dict_adapters to convert between native and HF weight formats. NeMo RL’s weight syncing requires the adapter to implement convert_single_tensor_to_hf. Some adapters (like CombinedProjectionStateDictAdapter) don’t implement this yet.

This function checks the adapter BEFORE model loading to avoid wasting time:

force_hf=True: no check needed, HF model won’t have an adapter.
force_hf not set + adapter incompatible: auto-set force_hf=True with a warning.
force_hf=False + adapter incompatible: raise an error telling the user to set force_hf=True or file an issue to NeMo-Automodel.

See: https://github.com/NVIDIA-NeMo/RL/issues/2072

nemo_rl.models.automodel.setup.get_tokenizer( tokenizer_config: nemo_rl.models.policy.TokenizerConfig, get_processor: bool = False, ) → Union[transformers.PreTrainedTokenizerBase, transformers.AutoProcessor]#

Get tokenizer using NeMoAutoTokenizer for automodel workers.

Uses NeMoAutoTokenizer which provides custom tokenizer dispatch per model type and falls back to NeMoAutoTokenizerWithBosEosEnforced for default handling.

Parameters:

tokenizer_config – A dictionary containing tokenizer configuration. Required keys: - name: The name or path of the pretrained tokenizer Optional keys: - chat_template: The chat template to use. Can be: - None: Uses a passthrough template that just returns message content - “default”: Uses the tokenizer’s default template - A file path ending in “.jinja”: Loads template from file - A custom jinja2 template string If not specified, the tokenizer’s default template will be used. - chat_template_kwargs: Arguments passed to tokenizer.apply_chat_template()
get_processor – Whether to return a processor (via AutoProcessor) instead of a tokenizer.

Returns:

The configured tokenizer or processor instance.

nemo_rl.models.automodel.setup.validate_and_prepare_config( config: nemo_rl.models.policy.PolicyConfig, processor: Optional[transformers.AutoProcessor], rank: int, ) → nemo_rl.models.automodel.config.RuntimeConfig#

Validate configuration and prepare runtime settings.

This function validates the policy configuration, sets environment variables, determines model configuration, and returns runtime settings as a named tuple.

Parameters:

config – Policy configuration dictionary
processor – Optional processor for multimodal models
rank – Current process rank

Returns:

RuntimeConfig named tuple containing validated configuration values

Raises:

ValueError – If configuration is invalid
RuntimeError – If incompatible settings are detected

nemo_rl.models.automodel.setup.setup_reference_model_state( model: torch.nn.Module, ) → dict[str, torch.Tensor]#

Set up reference model state dict by creating a CPU copy of the model’s state dict.

This creates a reference copy of the model weights on CPU with pinned memory for efficient CPU-GPU transfers. The reference model is typically used to compute reference log probabilities during RL training.

Parameters:: model – The model to create a reference copy from
Returns:: Dictionary mapping parameter names to CPU tensors with pinned memory

.. rubric:: Example

model = setup_model(…) reference_model_state_dict = setup_reference_model_state(model)

nemo_rl.models.automodel.setup.setup_distributed( config: nemo_rl.models.policy.PolicyConfig, runtime_config: nemo_rl.models.automodel.config.RuntimeConfig, ) → nemo_rl.models.automodel.config.DistributedContext#

Set up distributed training environment and create device meshes.

Initializes torch.distributed process group and creates FSDP2Config, MoEParallelizerConfig, and device meshes for distributed training.

Parameters:

config – Policy configuration dictionary
runtime_config – RuntimeConfig named tuple from validate_and_prepare_config

Returns:

DistributedContext containing device meshes and distributed configuration

nemo_rl.models.automodel.setup._disable_automodel_checkpoint_dtype_restore() → None#

No-op Automodel’s _restore_loaded_model_dtype on the HF/force_hf load path.

NeMo-RL loads policy models with torch_dtype=float32 to keep fp32 master weights for the optimizer. Automodel’s _restore_loaded_model_dtype (added after automodel 92635e74) re-casts each loaded parameter back to the bf16 checkpoint dtype, silently downgrading the master weights so AdamW updates underflow and the model fails to learn (e.g. grpo-nano-v2-12b reward stuck ~0.18). Disable it so the requested fp32 load is honored. Tracked by NVIDIA-NeMo/Automodel#2419; remove once the automodel pin includes it. See test_automodel_dtype_restore_workaround_still_needed for the removal tripwire.

nemo_rl.models.automodel.setup.setup_model_and_optimizer( config: nemo_rl.models.policy.PolicyConfig, tokenizer: transformers.AutoTokenizer, runtime_config: nemo_rl.models.automodel.config.RuntimeConfig, distributed_context: nemo_rl.models.automodel.config.DistributedContext, checkpoint_manager: Any, is_vlm: bool = False, init_optimizer: bool = True, weights_path: Optional[str] = None, optimizer_path: Optional[str] = None, ) → nemo_rl.models.automodel.config.ModelAndOptimizerState#

Set up model, parallelization, and optimizer.

Creates the model via from_pretrained() which handles meta device init, parallelization (FSDP2/TP/CP/EP), LoRA, and base weight loading internally.

Parameters:

config – Policy configuration dictionary
tokenizer – Tokenizer for the model
runtime_config – RuntimeConfig named tuple from validate_and_prepare_config
distributed_context – DistributedContext from setup_distributed
checkpoint_manager – Checkpoint manager for loading/saving weights
is_vlm – Whether this is a vision-language model
init_optimizer – Whether to initialize optimizer
weights_path – Optional path to checkpoint weights to load
optimizer_path – Optional path to optimizer state to load

Returns:

ModelAndOptimizerState containing model, optimizer, scheduler, and metadata

nemo_rl.models.automodel.setup#

Module Contents#

Functions#

Data#

API#

`nemo_rl.models.automodel.setup`#