`nemo_rl.data.datasets.response_datasets.oai_format_dataset`#

Module Contents#

Classes#

`PreservingDataset`	A dataset wrapper that preserves original dict structure without None-filling.
`OpenAIFormatDataset`	This class is used to load an SFT dataset in the OpenAI format.

API#

class nemo_rl.data.datasets.response_datasets.oai_format_dataset.PreservingDataset(data: list[dict[str, Any]])#

A dataset wrapper that preserves original dict structure without None-filling.

Unlike HuggingFace’s Dataset class which enforces schema uniformity across all samples (filling missing keys with None), this class maintains the exact structure of each sample. This is critical for heterogeneous data like tool calls where different samples may have different argument structures.

Initialization

Initialize the dataset with a list of dictionaries.

Parameters:: data – List of dictionary samples, each can have different keys

__len__() → int#

__getitem__( idx: Union[int, slice, list], ) → Union[dict[str, Any], list[dict[str, Any]]]#: Support integer indexing, slicing, and list indexing.

__iter__()#

map(

function: Callable,

*args,

**kwargs,

) → nemo_rl.data.datasets.response_datasets.oai_format_dataset.PreservingDataset#

Apply a function to each sample in the dataset.

Parameters:

function – Function to apply to each sample
with_indices – If True, pass index as second argument to function

Returns:

New PreservingDataset with transformed samples

class nemo_rl.data.datasets.response_datasets.oai_format_dataset.OpenAIFormatDataset( train_ds_path: str, val_ds_path: str, chat_key: str = 'messages', system_key: str | None = None, system_prompt: str | None = None, tool_key: str | None = 'tools', use_preserving_dataset: bool = False, )#

Bases: nemo_rl.data.datasets.raw_dataset.RawDataset

This class is used to load an SFT dataset in the OpenAI format.

The dataset should be in the following format: { “messages”: [ {“role”: “system”, “content”: “You are a helpful assistant.”}, {“role”: “user”, “content”: “What is the capital of France?”}, {“role”: “assistant”, “content”: “The capital of France is Paris.”} ] }