nemo_rl.data.utils#

Module Contents#

Functions#

setup_response_data

Setup data with environments.

setup_preference_data

Setup preference data.

API#

nemo_rl.data.utils.setup_response_data(
tokenizer: transformers.AutoProcessor | transformers.AutoTokenizer,
data_config: nemo_rl.data.DataConfig,
env_configs: Optional[dict[str, Any]] = None,
is_vlm: bool = False,
) Union[tuple[Union[nemo_rl.data.datasets.AllTaskProcessedDataset, dict[str, nemo_rl.data.datasets.AllTaskProcessedDataset]], Optional[nemo_rl.data.datasets.AllTaskProcessedDataset]], tuple[Union[nemo_rl.data.datasets.AllTaskProcessedDataset, dict[str, nemo_rl.data.datasets.AllTaskProcessedDataset]], Optional[nemo_rl.data.datasets.AllTaskProcessedDataset], dict[str, nemo_rl.environments.interfaces.EnvironmentInterface], dict[str, nemo_rl.environments.interfaces.EnvironmentInterface]]]#

Setup data with environments.

This function is used to setup the data and environments for the training and validation datasets.

Parameters:
  • tokenizer – Tokenizer or processor.

  • data_config – Data config.

  • env_configs

    Environment configs. If None, no environments will be created. This is used for:

    • Algorithms like SFT which do not need environments.

    • Environments like NeMo-Gym which need to handle the environment creation outside of this function.

  • is_vlm – Whether to use VLM training or not.

Returns:

A tuple of (train dataset, validation dataset, task to environment, task to validation environment). If env_configs is None: A tuple of (train dataset, validation dataset).

Return type:

If env_configs is not None

nemo_rl.data.utils.setup_preference_data(
tokenizer: transformers.AutoTokenizer,
data_config: nemo_rl.data.DataConfig,
) tuple[nemo_rl.data.datasets.AllTaskProcessedDataset, dict[str, nemo_rl.data.datasets.AllTaskProcessedDataset]]#

Setup preference data.

This function is used to setup the preference data for the training and validation datasets.

Parameters:
  • tokenizer – Tokenizer.

  • data_config – Data config for preference dataset.

Returns:

A tuple of (train dataset, validation dataset).