The Dynamo Frontend supports multiple tokenizer backends for BPE-based tokenizer.json models. BPE is the underlying tokenization algorithm, not a backend-specific feature: both the default HuggingFace path and the fastokens path can serve these models. The backend choice controls which implementation performs tokenization before requests are sent to the inference engine.
default HuggingFace TokenizersThe default backend uses the HuggingFace tokenizers library (Rust).
It supports features in tokenizer.json files (normalizers, pre-tokenizers, post-processors, decoders, added tokens with special-token flags, and byte-fallback).
fastokens High-Performance EncoderThe fastokens backend uses the fastokens crate, a purpose-built encoder optimized for throughput on supported BPE tokenizer.json models.
It is a hybrid backend: encoding uses fastokens while decoding falls back to HuggingFace so that incremental detokenization, byte-fallback, and special-token handling work correctly.
Use this backend when tokenization is a measurable bottleneck, for example on high-concurrency prefill-heavy workloads.
tokenizer.json files (Qwen, LLaMA, GPT-family, Mistral, DeepSeek, etc.).fastokens cannot load a particular tokenizer file, the frontend logs a warning and transparently falls back to HuggingFace; requests are never dropped..model / .tiktoken files), which always use the TikToken backend.Set the backend with a CLI flag or environment variable. The CLI flag takes precedence.
Examples:
When DYN_TOKENIZER=fastokens is set:
ModelDeploymentCard::tokenizer() attempts to load fastokens::Tokenizer from the same tokenizer.json file.FastTokenizer is created that encodes with fastokens and decodes with HuggingFace.