Pre-Flight Tokenizer Auto Detection
Pre-Flight Tokenizer Auto Detection
Pre-Flight Tokenizer Auto Detection
AIPerf resolves tokenizer names before spawning services via lightweight Hub API calls. This pre-flight check catches ambiguous or unknown names immediately without delaying startup: it does not download or load the tokenizer. Full tokenizer loading happens later inside each service, where errors like gated repos or missing files are caught and displayed with context-aware panels.
--tokenizer if specified, otherwise each --model name.qwen3-0.6b → Qwen/Qwen3-0.6B).Pre-flight is skipped when --use-server-token-count is set with a non-synthetic dataset, or when the endpoint type doesn’t require tokenization.
Pass --tokenizer builtin to use a zero-network-access tokenizer backed by tiktoken with the o200k_base encoding (GPT-4o / o1 / o3, 200k vocabulary). This skips all HuggingFace Hub alias resolution and downloads.
Use this when you don’t need a model-specific tokenizer and just want token counts for performance metrics. The encoding data is downloaded once on first use and cached locally by tiktoken — subsequent runs require no network access.
When --tokenizer is not set, AIPerf normally derives the tokenizer name from --model. If the model name looks like an obvious LLM-hallucinated placeholder (e.g. mock-model, test-model, fake-llama), AIPerf substitutes builtin for that model and emits a warning instead of attempting an HF Hub lookup that would fail. This avoids a confusing tokenizer error when the user is iterating against a mock or test server.
The check fires when the model name (case-insensitive, with _ normalized to -) is not path-like (no /, \, leading ., or leading ~) and matches either:
Examples that trigger the fallback: mock-model, Test-Model-v2, MOCK_LLAMA, placeholder, my-model. Examples that do not trigger it: meta-llama/Llama-3-test-finetune (path-like), gpt2, Qwen/Qwen3-0.6B.
Sample output:
Opt out by passing --tokenizer <name> explicitly. Any explicit value wins, even one that looks placeholder-ish — the check only runs when the tokenizer would otherwise default from --model. If a model with a placeholder-shaped name is real on your inference server, set --tokenizer to a real HF repo (or to builtin yourself to suppress the warning).
When a HuggingFace tokenizer has been previously downloaded, AIPerf detects it in the local HF cache and loads directly without any network calls. This applies to both alias resolution and tokenizer loading — no model_info() API call, no ETag update check. First run downloads as normal; every subsequent run is fully offline.
./, ../, or existing directories are used as-is.HF_HUB_OFFLINE or TRANSFORMERS_OFFLINE is set, names are used as-is.model_info() API call. Returns canonical model.id if found.RepositoryNotFoundError or HfHubHTTPError), searches with list_models(search=name, limit=50):
/<name>, picks highest downloads.Set HF_TOKEN for gated or private models.
Successful resolution:
Ambiguous name:
Gated repository error (runtime):
If a tokenizer fails during service initialization, AIPerf walks the __cause__ chain to show a context-aware panel. Duplicate errors across services are shown once.