nemo_curator.stages.text.models.model

View as Markdown

Module Contents

Classes

NameDescription
ModelStageBase class for Hugging Face model inference.

API

class nemo_curator.stages.text.models.model.ModelStage(
model_identifier: str,
cache_dir: str | None = None,
hf_token: str | None = None,
model_inference_batch_size: int = 256,
has_seq_order: bool = True,
padding_side: typing.Literal['left', 'right'] = 'right',
max_seq_length: int | None = None,
unpack_inference_batch: bool = False,
autocast: bool = True
)

Bases: ProcessingStage[DocumentBatch, DocumentBatch]

Base class for Hugging Face model inference.

Parameters:

model_identifier
str

The identifier of the Hugging Face model.

cache_dir
str | NoneDefaults to None

The Hugging Face cache directory. Defaults to None.

hf_token
str | NoneDefaults to None

Hugging Face token for downloading the model, if needed. Defaults to None.

model_inference_batch_size
intDefaults to 256

The size of the batch for model inference. Defaults to 256.

has_seq_order
boolDefaults to True

Whether to sort the input data by the length of the input tokens. Sorting is encouraged to improve the performance of the inference model. Defaults to True.

padding_side
Literal['left', 'right']Defaults to 'right'

The side to pad the input tokens. Defaults to “right”.

max_seq_length
int | NoneDefaults to None

If provided, clips the input tokens before the forward pass. Defaults to None.

unpack_inference_batch
boolDefaults to False

Whether to unpack the inference batch with **kwargs. Defaults to False.

autocast
boolDefaults to True

Whether to use autocast. When True, we trade off minor accuracy for faster inference. Defaults to True.

name
resources
= Resources(cpus=1, gpus=1)
nemo_curator.stages.text.models.model.ModelStage._model_forward(
model_input_batch: dict[str, torch.Tensor]
) -> torch.Tensor
nemo_curator.stages.text.models.model.ModelStage.collect_outputs(
processed_outputs: list[dict[str, numpy.ndarray]]
) -> dict[str, numpy.ndarray]
nemo_curator.stages.text.models.model.ModelStage.create_output_dataframe(
df_cpu: pandas.DataFrame,
collected_output: dict[str, numpy.ndarray]
) -> pandas.DataFrame
nemo_curator.stages.text.models.model.ModelStage.inputs() -> tuple[list[str], list[str]]
nemo_curator.stages.text.models.model.ModelStage.outputs() -> tuple[list[str], list[str]]
nemo_curator.stages.text.models.model.ModelStage.process(
batch: nemo_curator.tasks.DocumentBatch
) -> nemo_curator.tasks.DocumentBatch
nemo_curator.stages.text.models.model.ModelStage.process_model_output(
outputs: torch.Tensor,
model_input_batch: dict[str, torch.Tensor] | None = None
) -> dict[str, numpy.ndarray] | torch.Tensor
nemo_curator.stages.text.models.model.ModelStage.setup(
_: nemo_curator.backends.base.WorkerMetadata | None = None
) -> None
nemo_curator.stages.text.models.model.ModelStage.setup_on_node(
_node_info: nemo_curator.backends.base.NodeInfo | None = None,
_worker_metadata: nemo_curator.backends.base.WorkerMetadata = None
) -> None
nemo_curator.stages.text.models.model.ModelStage.teardown() -> None
nemo_curator.stages.text.models.model.ModelStage.yield_next_batch(
df: pandas.DataFrame
) -> collections.abc.Generator[dict[str, torch.Tensor]]

Yields a generator of model inputs for the next batch. We only move the batch to the GPU to reduce the memory overhead.

Parameters:

df
pd.DataFrame

The Pandas DataFrame (with input_ids and attention_mask) to process.