nemo_rl.models.automodel.setup#
Setup utilities for automodel-based training in NeMo RL.
Module Contents#
Functions#
Validate and maybe auto-set force_hf based on adapter compatibility. |
|
Get tokenizer using NeMoAutoTokenizer for automodel workers. |
|
Validate configuration and prepare runtime settings. |
|
Set up reference model state dict by creating a CPU copy of the model’s state dict. |
|
Set up distributed training environment and create device meshes. |
|
Set up model, parallelization, and optimizer. |
Data#
API#
- nemo_rl.models.automodel.setup.STRING_TO_DTYPE#
None
- nemo_rl.models.automodel.setup._maybe_set_force_hf(automodel_kwargs: dict, model_config) None#
Validate and maybe auto-set force_hf based on adapter compatibility.
Custom model implementations (e.g. Qwen2, Llama) use state_dict_adapters to convert between native and HF weight formats. NeMo RL’s weight syncing requires the adapter to implement
convert_single_tensor_to_hf. Some adapters (like CombinedProjectionStateDictAdapter) don’t implement this yet.This function checks the adapter BEFORE model loading to avoid wasting time:
force_hf=True: no check needed, HF model won’t have an adapter.
force_hf not set + adapter incompatible: auto-set force_hf=True with a warning.
force_hf=False + adapter incompatible: raise an error telling the user to set force_hf=True or file an issue to NeMo-Automodel.
See: https://github.com/NVIDIA-NeMo/RL/issues/2072
- nemo_rl.models.automodel.setup.get_tokenizer(
- tokenizer_config: nemo_rl.models.policy.TokenizerConfig,
- get_processor: bool = False,
Get tokenizer using NeMoAutoTokenizer for automodel workers.
Uses NeMoAutoTokenizer which provides custom tokenizer dispatch per model type and falls back to NeMoAutoTokenizerWithBosEosEnforced for default handling.
- Parameters:
tokenizer_config – A dictionary containing tokenizer configuration. Required keys: - name: The name or path of the pretrained tokenizer Optional keys: - chat_template: The chat template to use. Can be: - None: Uses a passthrough template that just returns message content - “default”: Uses the tokenizer’s default template - A file path ending in “.jinja”: Loads template from file - A custom jinja2 template string If not specified, the tokenizer’s default template will be used. - chat_template_kwargs: Arguments passed to tokenizer.apply_chat_template()
get_processor – Whether to return a processor (via AutoProcessor) instead of a tokenizer.
- Returns:
The configured tokenizer or processor instance.
- nemo_rl.models.automodel.setup.validate_and_prepare_config(
- config: nemo_rl.models.policy.PolicyConfig,
- processor: Optional[transformers.AutoProcessor],
- rank: int,
Validate configuration and prepare runtime settings.
This function validates the policy configuration, sets environment variables, determines model configuration, and returns runtime settings as a named tuple.
- Parameters:
config – Policy configuration dictionary
processor – Optional processor for multimodal models
rank – Current process rank
- Returns:
RuntimeConfig named tuple containing validated configuration values
- Raises:
ValueError – If configuration is invalid
RuntimeError – If incompatible settings are detected
- nemo_rl.models.automodel.setup.setup_reference_model_state(
- model: torch.nn.Module,
Set up reference model state dict by creating a CPU copy of the model’s state dict.
This creates a reference copy of the model weights on CPU with pinned memory for efficient CPU-GPU transfers. The reference model is typically used to compute reference log probabilities during RL training.
- Parameters:
model – The model to create a reference copy from
- Returns:
Dictionary mapping parameter names to CPU tensors with pinned memory
.. rubric:: Example
model = setup_model(…) reference_model_state_dict = setup_reference_model_state(model)
- nemo_rl.models.automodel.setup.setup_distributed(
- config: nemo_rl.models.policy.PolicyConfig,
- runtime_config: nemo_rl.models.automodel.config.RuntimeConfig,
Set up distributed training environment and create device meshes.
Initializes torch.distributed process group and creates FSDP2Config, MoEParallelizerConfig, and device meshes for distributed training.
- Parameters:
config – Policy configuration dictionary
runtime_config – RuntimeConfig named tuple from validate_and_prepare_config
- Returns:
DistributedContext containing device meshes and distributed configuration
- nemo_rl.models.automodel.setup.setup_model_and_optimizer(
- config: nemo_rl.models.policy.PolicyConfig,
- tokenizer: transformers.AutoTokenizer,
- runtime_config: nemo_rl.models.automodel.config.RuntimeConfig,
- distributed_context: nemo_rl.models.automodel.config.DistributedContext,
- checkpoint_manager: Any,
- is_vlm: bool = False,
- init_optimizer: bool = True,
- weights_path: Optional[str] = None,
- optimizer_path: Optional[str] = None,
Set up model, parallelization, and optimizer.
Creates the model via from_pretrained() which handles meta device init, parallelization (FSDP2/TP/CP/EP), LoRA, and base weight loading internally.
- Parameters:
config – Policy configuration dictionary
tokenizer – Tokenizer for the model
runtime_config – RuntimeConfig named tuple from validate_and_prepare_config
distributed_context – DistributedContext from setup_distributed
checkpoint_manager – Checkpoint manager for loading/saving weights
is_vlm – Whether this is a vision-language model
init_optimizer – Whether to initialize optimizer
weights_path – Optional path to checkpoint weights to load
optimizer_path – Optional path to optimizer state to load
- Returns:
ModelAndOptimizerState containing model, optimizer, scheduler, and metadata