For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Kubernetes Deployment
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
    • Glossary
  • Digest
    • Dynamo Day 0 support for TokenSpeed
    • Multi-Turn Agentic Harnesses
    • Full-Stack Optimizations for Agentic Inference
    • Flash Indexer: Inter-Galactic KV Routing
  • Kubernetes Deployment
  • User Guides
    • Disaggregated Serving
    • KV Cache Aware Routing
    • KV Cache Offloading
    • Tool Calling
    • Reasoning
    • Agents
    • Multimodal
    • Diffusion
    • LoRA Adapters
    • Observability (Local)
    • Fault Tolerance
    • Benchmarking
    • Writing Python Workers
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
      • Frontend Guide
      • Tokenizer
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
    • LMCache
    • SGLang HiCache
    • FlexKV
    • KV Events for Custom Engines
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Tokenizer Backends
  • default HuggingFace Tokenizers
  • fastokens High-Performance Encoder
  • Compatibility notes:
  • Configuration
  • Dynamo Frontend Behavior
ComponentsFrontend

Tokenizer

||View as Markdown|
Previous

Frontend Guide

Next

Router

The Dynamo Frontend supports multiple tokenizer backends for BPE-based tokenizer.json models. BPE is the underlying tokenization algorithm, not a backend-specific feature: both the default HuggingFace path and the fastokens path can serve these models. The backend choice controls which implementation performs tokenization before requests are sent to the inference engine.

Tokenizer Backends

default HuggingFace Tokenizers

The default backend uses the HuggingFace tokenizers library (Rust). It supports features in tokenizer.json files (normalizers, pre-tokenizers, post-processors, decoders, added tokens with special-token flags, and byte-fallback).

fastokens High-Performance Encoder

The fastokens backend uses the fastokens crate, a purpose-built encoder optimized for throughput on supported BPE tokenizer.json models. It is a hybrid backend: encoding uses fastokens while decoding falls back to HuggingFace so that incremental detokenization, byte-fallback, and special-token handling work correctly.

Use this backend when tokenization is a measurable bottleneck, for example on high-concurrency prefill-heavy workloads.

Compatibility notes:

  • Works with standard BPE tokenizer.json files (Qwen, LLaMA, GPT-family, Mistral, DeepSeek, etc.).
  • If fastokens cannot load a particular tokenizer file, the frontend logs a warning and transparently falls back to HuggingFace; requests are never dropped.
  • Has no effect on TikToken-format tokenizers (.model / .tiktoken files), which always use the TikToken backend.

Configuration

Set the backend with a CLI flag or environment variable. The CLI flag takes precedence.

CLI ArgumentEnv VarValid valuesDefault
--tokenizerDYN_TOKENIZERdefault, fastokensdefault

Examples:

$# CLI flag
$python -m dynamo.frontend --tokenizer fastokens
$
$# Environment variable
$export DYN_TOKENIZER=fastokens
$python -m dynamo.frontend

Dynamo Frontend Behavior

When DYN_TOKENIZER=fastokens is set:

  1. The frontend passes the environment variable to the Rust runtime.
  2. When building the tokenizer for a model, ModelDeploymentCard::tokenizer() attempts to load fastokens::Tokenizer from the same tokenizer.json file.
  3. If loading succeeds, a hybrid FastTokenizer is created that encodes with fastokens and decodes with HuggingFace.
  4. If loading fails (unsupported tokenizer features, missing file, etc.), the frontend logs a warning and falls back to the standard HuggingFace backend; no operator intervention is needed.