nemo_curator.stages.text.models.utils

View as Markdown

Module Contents

Functions

NameDescription
clip_tokensClip the tokens to the smallest size possible.
format_name_with_suffix-

Data

ATTENTION_MASK_FIELD

INPUT_ID_FIELD

SEQ_ORDER_FIELD

TOKEN_LENGTH_FIELD

API

nemo_curator.stages.text.models.utils.clip_tokens(
token_o: dict,
padding_side: typing.Literal['left', 'right'] = 'right'
) -> dict[str, torch.Tensor]

Clip the tokens to the smallest size possible.

Parameters:

token_o
dict

The dictionary containing the input tokens (input_ids, attention_mask).

padding_side
Literal['left', 'right']Defaults to 'right'

The side to pad the input tokens. Defaults to “right”.

Returns: dict[str, torch.Tensor]

The clipped tokens (input_ids, attention_mask).

nemo_curator.stages.text.models.utils.format_name_with_suffix(
model_identifier: str,
suffix: str = '_classifier'
) -> str
nemo_curator.stages.text.models.utils.ATTENTION_MASK_FIELD = 'attention_mask'
nemo_curator.stages.text.models.utils.INPUT_ID_FIELD = 'input_ids'
nemo_curator.stages.text.models.utils.SEQ_ORDER_FIELD = '_curator_seq_order'
nemo_curator.stages.text.models.utils.TOKEN_LENGTH_FIELD = '_curator_token_length'