nemo_rl.data.hf_datasets.oai_format_dataset#

Module Contents#

Classes#

OpenAIFormatDataset

This class is used to load an SFT dataset in the OpenAI format.

API#

class nemo_rl.data.hf_datasets.oai_format_dataset.OpenAIFormatDataset(
train_ds_path: str,
val_ds_path: str,
chat_key: str = 'messages',
system_key: str | None = None,
system_prompt: str | None = None,
)[source]#

This class is used to load an SFT dataset in the OpenAI format.

The dataset should be in the following format: { “messages”: [ {“role”: “system”, “content”: “You are a helpful assistant.”}, {“role”: “user”, “content”: “What is the capital of France?”}, {“role”: “assistant”, “content”: “The capital of France is Paris.”} ] } system_key and system_prompt are optional. If provided, it will be added to the beginning of the dataset. chat_key should be the key of the messages list. Multi-turn conversations are supported. The last message in the conversation must be from the assistant.

Initialization

add_messages_key(
example: dict[str, Any],
) dict[str, list[dict[str, Any]]][source]#