Pre-Flight Tokenizer Auto Detection
Pre-Flight Tokenizer Auto Detection
AIPerf resolves tokenizer names before spawning services via lightweight Hub API calls. This pre-flight check catches ambiguous or unknown names immediately without delaying startup: it does not download or load the tokenizer. Full tokenizer loading happens later inside each service, where errors like gated repos or missing files are caught and displayed with context-aware panels.
How It Works
- Determine names: Uses
--tokenizerif specified, otherwise each--modelname. - Resolve aliases: Lightweight Hub API calls to resolve aliases to canonical repository IDs (e.g.,
qwen3-0.6b→Qwen/Qwen3-0.6B). - Fail fast on ambiguity: If no exact or suffix match, displays top matches by downloads and exits.
- Cache results: Resolved names are passed to all services so they skip re-resolution.
Pre-flight is skipped when --use-server-token-count is set with a non-synthetic dataset, or when the endpoint type doesn’t require tokenization.
Built-in Tokenizer
Pass --tokenizer builtin to use a zero-network-access tokenizer backed by tiktoken with the o200k_base encoding (GPT-4o / o1 / o3, 200k vocabulary). This skips all HuggingFace Hub alias resolution and downloads.
Use this when you don’t need a model-specific tokenizer and just want token counts for performance metrics. The encoding data is downloaded once on first use and cached locally by tiktoken — subsequent runs require no network access.
Automatic Cache Detection
When a HuggingFace tokenizer has been previously downloaded, AIPerf detects it in the local HF cache and loads directly without any network calls. This applies to both alias resolution and tokenizer loading — no model_info() API call, no ETag update check. First run downloads as normal; every subsequent run is fully offline.
Alias Resolution
- Local paths: Absolute,
./,../, or existing directories are used as-is. - Cached locally: If the model directory exists in the HF cache, the name is used as-is (no network).
- Offline mode: If
HF_HUB_OFFLINEorTRANSFORMERS_OFFLINEis set, names are used as-is. - Direct lookup:
model_info()API call. Returns canonicalmodel.idif found. - Search fallback: If direct lookup fails (
RepositoryNotFoundErrororHfHubHTTPError), searches withlist_models(search=name, limit=50):- Exact match: Result ID matches input (case-insensitive).
- Suffix match: Result ends with
/<name>, picks highest downloads. - Ambiguous: No match found, returns top 5 suggestions.
Set HF_TOKEN for gated or private models.
Output Examples
Successful resolution:
Ambiguous name:
Gated repository error (runtime):
Runtime Error Panels
If a tokenizer fails during service initialization, AIPerf walks the __cause__ chain to show a context-aware panel. Duplicate errors across services are shown once.