nemo_automodel.components.datasets.llm.hellaswag

Module Contents

Classes

Name	Description
`HellaSwag`	A dataset wrapper for the HellaSwag benchmark, tailored for single-turn supervised fine-tuning (SFT).

API

class nemo_automodel.components.datasets.llm.hellaswag.HellaSwag(
    path_or_dataset,
    tokenizer,
    split = 'train',
    num_samples_limit = None,
    pad_to_max_length = True
)

A dataset wrapper for the HellaSwag benchmark, tailored for single-turn supervised fine-tuning (SFT).

This class loads and preprocesses the HellaSwag dataset using a tokenizer and a custom preprocessing pipeline for language model fine-tuning. The dataset consists of context and multiple-choice endings, where the goal is to choose the most plausible continuation.

dataset

= processor.process(raw_datasets, self)

nemo_automodel.components.datasets.llm.hellaswag.HellaSwag.__getitem__(
    index
)

Get a processed example by index.

Parameters:

index

int

Index of the example.

Returns:

A tokenized and preprocessed example.

nemo_automodel.components.datasets.llm.hellaswag.HellaSwag.__len__()

Get the number of examples in the dataset.

Returns:

Length of the processed dataset.

nemo_automodel.components.datasets.llm.hellaswag.HellaSwag.get_context(
    examples
)

Extracts the context part of each example.

Parameters:

examples

dict

A dictionary containing example data with a “ctx” key.

Returns:

list[str]: List of context strings.

nemo_automodel.components.datasets.llm.hellaswag.HellaSwag.get_target(
    examples
)

Extracts the correct ending based on the label.

Parameters:

examples

dict

A dictionary with “endings” (list of strings) and “label” (index of correct ending).

Returns:

list[str]: The gold target strings based on the label index.