nemo_rl.data.utils#
Module Contents#
Functions#
Return the training |
|
Restore a dataloader’s state from a checkpoint, with dataset-swap guard. |
|
Setup data with environments. |
|
Setup preference data. |
API#
- nemo_rl.data.utils.get_train_dataset_name(
- data_config: nemo_rl.data.DataConfig,
Return the training
dataset_namefrom a data config.The shape of
data_config["train"]is not consistent across algorithms at the point where checkpoint save/load happens:setup_response_data(used by GRPO/Distillation) andsetup_datainrun_sft.pynormalize a single-dataset dict into[dict].setup_preference_data(used by DPO/RM) leaves it as adict.
This helper tolerates both shapes and returns
Nonewhen the dataset name cannot be determined (e.g. legacy checkpoints with no name written, multi-dataset training, or malformed configs).
- nemo_rl.data.utils.load_dataloader_state(
- dataloader: Any,
- checkpoint_path: str,
- data_config: nemo_rl.data.DataConfig,
- suffix: str = '',
Restore a dataloader’s state from a checkpoint, with dataset-swap guard.
Loads
{checkpoint_path}/train_dataloader{suffix}.ptand, when aconfig.yamlis also present in the checkpoint dir (always written byCheckpointManager.init_tmp_checkpoint), compares the saveddataset_nameto the current run’sdataset_name. On mismatch the dataloader state restore is skipped so the new dataset starts from index 0 — otherwiseStatefulDataLoaderwould inheritsamples_yieldedfrom the old run, silently skipping samples and (when the new dataset is shorter than that count) crashing withStopIterationduringiter().No on-disk format change is needed:
train_dataloader{suffix}.ptkeeps its existing raw-state_dictshape, and the saveddataset_nameis read out of the siblingconfig.yamlso every existing checkpoint is automatically compatible.
- nemo_rl.data.utils.setup_response_data(
- tokenizer: transformers.AutoProcessor | transformers.AutoTokenizer,
- data_config: nemo_rl.data.DataConfig,
- env_configs: Optional[dict[str, Any]] = None,
- is_vlm: bool = False,
Setup data with environments.
This function is used to setup the data and environments for the training and validation datasets.
- Parameters:
tokenizer – Tokenizer or processor.
data_config – Data config.
env_configs –
Environment configs. If None, no environments will be created. This is used for:
Algorithms like SFT which do not need environments.
Environments like NeMo-Gym which need to handle the environment creation outside of this function.
is_vlm – Whether to use VLM training or not.
- Returns:
A tuple of (train dataset, validation dataset, task to environment, task to validation environment). If env_configs is None: A tuple of (train dataset, validation dataset).
- Return type:
If env_configs is not None
- nemo_rl.data.utils.setup_preference_data(
- tokenizer: transformers.AutoTokenizer,
- data_config: nemo_rl.data.DataConfig,
Setup preference data.
This function is used to setup the preference data for the training and validation datasets.
- Parameters:
tokenizer – Tokenizer.
data_config – Data config for preference dataset.
- Returns:
A tuple of (train dataset, validation dataset).