nemo_rl.data.datasets.response_datasets.tulu3#
Module Contents#
Classes#
Simple wrapper around the Tulu3 SFT mixture dataset with train split. |
API#
- class nemo_rl.data.datasets.response_datasets.tulu3.Tulu3SftMixtureDataset(
- split_validation_size: float = 0.05,
- seed: int = 42,
- max_samples: int | None = None,
- **kwargs,
Bases:
nemo_rl.data.datasets.raw_dataset.RawDatasetSimple wrapper around the Tulu3 SFT mixture dataset with train split.
- Parameters:
split_validation_size – Size of the validation data, default is 0.05
seed – Seed for train/validation split when split_validation_size > 0, default is 42
max_samples – Optional maximum number of samples to use from the dataset
Initialization
- format_data(data: dict[str, Any]) dict[str, Any]#