nemo_export.tiktoken_tokenizer
#
Module Contents#
Classes#
Functions#
Reload the tokenizer JSON file and convert it to Tiktoken format. |
Data#
API#
- nemo_export.tiktoken_tokenizer.PATTERN_TIKTOKEN = '[^\\\\r\\\\n\\\\p{L}\\\\p{N}]?[\\\\p{Lu}\\\\p{Lt}\\\\p{Lm}\\\\p{Lo}\\\\p{M}]*[\\\\p{Ll}\\\\p{Lm}\\\\p{Lo}\\\\p{M}]+|[^\\\\r\\\\n\\\\...'#
- nemo_export.tiktoken_tokenizer.DEFAULT_TIKTOKEN_MAX_VOCAB = None#
- nemo_export.tiktoken_tokenizer.SPECIAL_TOKENS = ['<unk>', '<s>', '</s>']#
- nemo_export.tiktoken_tokenizer.SPECIAL_TOKEN_TEMPLATE = '<SPECIAL_{id}>'#
- nemo_export.tiktoken_tokenizer.reload_mergeable_ranks(
- path: str,
- max_vocab: Optional[int] = None,
Reload the tokenizer JSON file and convert it to Tiktoken format.