core.tokenizers.text.models.default_tokenizer#

Module Contents#

Classes#

DefaultTokenizerText

Base class for Megatron default tokenizer.

API#

class core.tokenizers.text.models.default_tokenizer.DefaultTokenizerText(path: str = None, config: dict = None, **kwargs)#

Bases: megatron.core.tokenizers.text.text_tokenizer.MegatronTokenizerText

Base class for Megatron default tokenizer.

Initialization

Parameters:
  • path (str) – path to the tokenizer model.

  • config (dict) – tokenizer parameters. library (str): tokenizer library. class_name (str): name of tokenizer class. class_path (str): path to tokenizer class. model_type (str): type of the model to be used with tokenizer. chat_template (str): tokenizer chat template.