nemo_automodel._transformers.auto_tokenizer#

Module Contents#

Classes#

NeMoAutoTokenizer

Auto tokenizer class that dispatches to appropriate tokenizer implementations.

Functions#

_get_model_type

Determine the model type from the config.

Data#

API#

nemo_automodel._transformers.auto_tokenizer.logger#

‘getLogger(…)’

nemo_automodel._transformers.auto_tokenizer._get_model_type(
pretrained_model_name_or_path: str,
trust_remote_code: bool = False,
) Optional[str]#

Determine the model type from the config.

Parameters:
  • pretrained_model_name_or_path – Model identifier or path

  • trust_remote_code – Whether to trust remote code

Returns:

The model_type string, or None if it cannot be determined

class nemo_automodel._transformers.auto_tokenizer.NeMoAutoTokenizer#

Bases: transformers.AutoTokenizer

Auto tokenizer class that dispatches to appropriate tokenizer implementations.

Similar to HuggingFace’s AutoTokenizer, but with a custom registry for specialized tokenizer implementations.

The dispatch logic is:

  1. If a custom tokenizer is registered for the model type, use it

  2. Otherwise, fall back to NeMoAutoTokenizerWithBosEosEnforced

.. rubric:: Example

Will use MistralCommonBackend if available for Mistral models

tokenizer = NeMoAutoTokenizer.from_pretrained(“mistralai/Mistral-7B-v0.1”)

Force using HF AutoTokenizer with BOS/EOS enforcement

tokenizer = NeMoAutoTokenizer.from_pretrained(“gpt2”, force_default=True)

Initialization

_registry#

None

classmethod register(
model_type: str,
tokenizer_cls: Union[Type, Callable],
) None#

Register a custom tokenizer for a specific model type.

Parameters:
  • model_type – The model type string (e.g., “mistral”, “llama”)

  • tokenizer_cls – The tokenizer class or factory function

classmethod from_pretrained(
pretrained_model_name_or_path: str,
*args,
force_default: bool = False,
force_hf: bool = False,
trust_remote_code: bool = False,
**kwargs,
)#

Load a tokenizer from a pretrained model.

Parameters:
  • pretrained_model_name_or_path – Model identifier or path

  • force_default – If True, always use NeMoAutoTokenizerWithBosEosEnforced

  • force_hf – If True, return the raw HF AutoTokenizer without any wrapping

  • trust_remote_code – Whether to trust remote code when loading config

  • **kwargs – Additional arguments passed to the tokenizer’s from_pretrained

Returns:

A tokenizer instance appropriate for the model type

nemo_automodel._transformers.auto_tokenizer.__all__#

[‘NeMoAutoTokenizer’, ‘NeMoAutoTokenizerWithBosEosEnforced’, ‘TokenizerRegistry’]