nemo_rl.models.automodel.setup#

Setup utilities for automodel-based training in NeMo RL.

Module Contents#

Functions#

_maybe_set_force_hf

Validate and maybe auto-set force_hf based on adapter compatibility.

get_tokenizer

Get tokenizer using NeMoAutoTokenizer for automodel workers.

validate_and_prepare_config

Validate configuration and prepare runtime settings.

setup_reference_model_state

Set up reference model state dict by creating a CPU copy of the model’s state dict.

setup_distributed

Set up distributed training environment and create device meshes.

setup_model_and_optimizer

Set up model, parallelization, and optimizer.

Data#

API#

nemo_rl.models.automodel.setup.STRING_TO_DTYPE#

None

nemo_rl.models.automodel.setup._maybe_set_force_hf(automodel_kwargs: dict, model_config) None#

Validate and maybe auto-set force_hf based on adapter compatibility.

Custom model implementations (e.g. Qwen2, Llama) use state_dict_adapters to convert between native and HF weight formats. NeMo RL’s weight syncing requires the adapter to implement convert_single_tensor_to_hf. Some adapters (like CombinedProjectionStateDictAdapter) don’t implement this yet.

This function checks the adapter BEFORE model loading to avoid wasting time:

  • force_hf=True: no check needed, HF model won’t have an adapter.

  • force_hf not set + adapter incompatible: auto-set force_hf=True with a warning.

  • force_hf=False + adapter incompatible: raise an error telling the user to set force_hf=True or file an issue to NeMo-Automodel.

See: https://github.com/NVIDIA-NeMo/RL/issues/2072

nemo_rl.models.automodel.setup.get_tokenizer(
tokenizer_config: nemo_rl.models.policy.TokenizerConfig,
get_processor: bool = False,
) Union[transformers.PreTrainedTokenizerBase, transformers.AutoProcessor]#

Get tokenizer using NeMoAutoTokenizer for automodel workers.

Uses NeMoAutoTokenizer which provides custom tokenizer dispatch per model type and falls back to NeMoAutoTokenizerWithBosEosEnforced for default handling.

Parameters:
  • tokenizer_config – A dictionary containing tokenizer configuration. Required keys: - name: The name or path of the pretrained tokenizer Optional keys: - chat_template: The chat template to use. Can be: - None: Uses a passthrough template that just returns message content - “default”: Uses the tokenizer’s default template - A file path ending in “.jinja”: Loads template from file - A custom jinja2 template string If not specified, the tokenizer’s default template will be used. - chat_template_kwargs: Arguments passed to tokenizer.apply_chat_template()

  • get_processor – Whether to return a processor (via AutoProcessor) instead of a tokenizer.

Returns:

The configured tokenizer or processor instance.

nemo_rl.models.automodel.setup.validate_and_prepare_config(
config: nemo_rl.models.policy.PolicyConfig,
processor: Optional[transformers.AutoProcessor],
rank: int,
) nemo_rl.models.automodel.config.RuntimeConfig#

Validate configuration and prepare runtime settings.

This function validates the policy configuration, sets environment variables, determines model configuration, and returns runtime settings as a named tuple.

Parameters:
  • config – Policy configuration dictionary

  • processor – Optional processor for multimodal models

  • rank – Current process rank

Returns:

RuntimeConfig named tuple containing validated configuration values

Raises:
  • ValueError – If configuration is invalid

  • RuntimeError – If incompatible settings are detected

nemo_rl.models.automodel.setup.setup_reference_model_state(
model: torch.nn.Module,
) dict[str, torch.Tensor]#

Set up reference model state dict by creating a CPU copy of the model’s state dict.

This creates a reference copy of the model weights on CPU with pinned memory for efficient CPU-GPU transfers. The reference model is typically used to compute reference log probabilities during RL training.

Parameters:

model – The model to create a reference copy from

Returns:

Dictionary mapping parameter names to CPU tensors with pinned memory

.. rubric:: Example

model = setup_model(…) reference_model_state_dict = setup_reference_model_state(model)

nemo_rl.models.automodel.setup.setup_distributed(
config: nemo_rl.models.policy.PolicyConfig,
runtime_config: nemo_rl.models.automodel.config.RuntimeConfig,
) nemo_rl.models.automodel.config.DistributedContext#

Set up distributed training environment and create device meshes.

Initializes torch.distributed process group and creates FSDP2Config, MoEParallelizerConfig, and device meshes for distributed training.

Parameters:
  • config – Policy configuration dictionary

  • runtime_config – RuntimeConfig named tuple from validate_and_prepare_config

Returns:

DistributedContext containing device meshes and distributed configuration

nemo_rl.models.automodel.setup.setup_model_and_optimizer(
config: nemo_rl.models.policy.PolicyConfig,
tokenizer: transformers.AutoTokenizer,
runtime_config: nemo_rl.models.automodel.config.RuntimeConfig,
distributed_context: nemo_rl.models.automodel.config.DistributedContext,
checkpoint_manager: Any,
is_vlm: bool = False,
init_optimizer: bool = True,
weights_path: Optional[str] = None,
optimizer_path: Optional[str] = None,
) nemo_rl.models.automodel.config.ModelAndOptimizerState#

Set up model, parallelization, and optimizer.

Creates the model via from_pretrained() which handles meta device init, parallelization (FSDP2/TP/CP/EP), LoRA, and base weight loading internally.

Parameters:
  • config – Policy configuration dictionary

  • tokenizer – Tokenizer for the model

  • runtime_config – RuntimeConfig named tuple from validate_and_prepare_config

  • distributed_context – DistributedContext from setup_distributed

  • checkpoint_manager – Checkpoint manager for loading/saving weights

  • is_vlm – Whether this is a vision-language model

  • init_optimizer – Whether to initialize optimizer

  • weights_path – Optional path to checkpoint weights to load

  • optimizer_path – Optional path to optimizer state to load

Returns:

ModelAndOptimizerState containing model, optimizer, scheduler, and metadata