nemo_rl.data.utils#

Module Contents#

Functions#

get_train_dataset_name

Return the training dataset_name from a data config.

load_dataloader_state

Restore a dataloader’s state from a checkpoint, with dataset-swap guard.

setup_response_data

Setup data with environments.

setup_preference_data

Setup preference data.

API#

nemo_rl.data.utils.get_train_dataset_name(
data_config: nemo_rl.data.DataConfig,
) Optional[str]#

Return the training dataset_name from a data config.

The shape of data_config["train"] is not consistent across algorithms at the point where checkpoint save/load happens:

  • setup_response_data (used by GRPO/Distillation) and setup_data in run_sft.py normalize a single-dataset dict into [dict].

  • setup_preference_data (used by DPO/RM) leaves it as a dict.

This helper tolerates both shapes and returns None when the dataset name cannot be determined (e.g. legacy checkpoints with no name written, multi-dataset training, or malformed configs).

nemo_rl.data.utils.load_dataloader_state(
dataloader: Any,
checkpoint_path: str,
data_config: nemo_rl.data.DataConfig,
suffix: str = '',
) None#

Restore a dataloader’s state from a checkpoint, with dataset-swap guard.

Loads {checkpoint_path}/train_dataloader{suffix}.pt and, when a config.yaml is also present in the checkpoint dir (always written by CheckpointManager.init_tmp_checkpoint), compares the saved dataset_name to the current run’s dataset_name. On mismatch the dataloader state restore is skipped so the new dataset starts from index 0 — otherwise StatefulDataLoader would inherit samples_yielded from the old run, silently skipping samples and (when the new dataset is shorter than that count) crashing with StopIteration during iter().

No on-disk format change is needed: train_dataloader{suffix}.pt keeps its existing raw-state_dict shape, and the saved dataset_name is read out of the sibling config.yaml so every existing checkpoint is automatically compatible.

nemo_rl.data.utils.setup_response_data(
tokenizer: transformers.AutoProcessor | transformers.AutoTokenizer,
data_config: nemo_rl.data.DataConfig,
env_configs: Optional[dict[str, Any]] = None,
is_vlm: bool = False,
) Union[tuple[Union[nemo_rl.data.datasets.AllTaskProcessedDataset, dict[str, nemo_rl.data.datasets.AllTaskProcessedDataset]], Optional[nemo_rl.data.datasets.AllTaskProcessedDataset]], tuple[Union[nemo_rl.data.datasets.AllTaskProcessedDataset, dict[str, nemo_rl.data.datasets.AllTaskProcessedDataset]], Optional[nemo_rl.data.datasets.AllTaskProcessedDataset], dict[str, nemo_rl.environments.interfaces.EnvironmentInterface], dict[str, nemo_rl.environments.interfaces.EnvironmentInterface]]]#

Setup data with environments.

This function is used to setup the data and environments for the training and validation datasets.

Parameters:
  • tokenizer – Tokenizer or processor.

  • data_config – Data config.

  • env_configs

    Environment configs. If None, no environments will be created. This is used for:

    • Algorithms like SFT which do not need environments.

    • Environments like NeMo-Gym which need to handle the environment creation outside of this function.

  • is_vlm – Whether to use VLM training or not.

Returns:

A tuple of (train dataset, validation dataset, task to environment, task to validation environment). If env_configs is None: A tuple of (train dataset, validation dataset).

Return type:

If env_configs is not None

nemo_rl.data.utils.setup_preference_data(
tokenizer: transformers.AutoTokenizer,
data_config: nemo_rl.data.DataConfig,
) tuple[nemo_rl.data.datasets.AllTaskProcessedDataset, dict[str, nemo_rl.data.datasets.AllTaskProcessedDataset]]#

Setup preference data.

This function is used to setup the preference data for the training and validation datasets.

Parameters:
  • tokenizer – Tokenizer.

  • data_config – Data config for preference dataset.

Returns:

A tuple of (train dataset, validation dataset).