nemo_rl.data.hf_datasets.oai_format_dataset
#
Module Contents#
Classes#
This class is used to load an SFT dataset in the OpenAI format. |
API#
- class nemo_rl.data.hf_datasets.oai_format_dataset.OpenAIFormatDataset(
- train_ds_path: str,
- val_ds_path: str,
- chat_key: str = 'messages',
- system_key: str | None = None,
- system_prompt: str | None = None,
This class is used to load an SFT dataset in the OpenAI format.
The dataset should be in the following format: { “messages”: [ {“role”: “system”, “content”: “You are a helpful assistant.”}, {“role”: “user”, “content”: “What is the capital of France?”}, {“role”: “assistant”, “content”: “The capital of France is Paris.”} ] } system_key and system_prompt are optional. If provided, it will be added to the beginning of the dataset. chat_key should be the key of the messages list. Multi-turn conversations are supported. The last message in the conversation must be from the assistant.
Initialization