nemo_automodel.components.datasets.llm.hellaswag
nemo_automodel.components.datasets.llm.hellaswag
Module Contents
Classes
API
A dataset wrapper for the HellaSwag benchmark, tailored for single-turn supervised fine-tuning (SFT).
This class loads and preprocesses the HellaSwag dataset using a tokenizer and a custom preprocessing pipeline for language model fine-tuning. The dataset consists of context and multiple-choice endings, where the goal is to choose the most plausible continuation.
Get a processed example by index.
Parameters:
Index of the example.
Returns:
A tokenized and preprocessed example.
Get the number of examples in the dataset.
Returns:
Length of the processed dataset.
Extracts the context part of each example.
Parameters:
A dictionary containing example data with a “ctx” key.
Returns:
list[str]: List of context strings.
Extracts the correct ending based on the label.
Parameters:
A dictionary with “endings” (list of strings) and “label” (index of correct ending).
Returns:
list[str]: The gold target strings based on the label index.