AIPerf can be configured to use local tokenizers without requiring a connection to HuggingFace. This is particularly useful in environments where direct access to HuggingFace is blocked or restricted.
This guide shows you how to run AIPerf using locally stored tokenizer files instead of downloading them from HuggingFace.
Before you begin, ensure you have:
tokenizer.json, vocab.txt, config.json)Make sure your tokenizer files are stored in a local directory. A typical tokenizer directory structure looks like this:
Ensure your tokenizer files match the HuggingFace/tokenizers format. The files should be compatible with the transformers library’s tokenizer loading mechanism.
Use the --tokenizer parameter to specify the path to your local tokenizer directory or file:
Sample Output (Successful Run):
If you are using a custom tokenizer (one that is not a standard pretrained model from HuggingFace), you can still use it with AIPerf as long as it adheres to the rules below.
Crucial: Your custom tokenizer MUST be saved in the HuggingFace transformers format. AIPerf relies on the transformers library to load tokenizers, so standard compatibility is required.
--tokenizer, AIPerf loads the tokenizer directly from your local filesFor strictly air-gapped environments where you want to explicitly forbid any connection attempts, you can set the following environment variables:
This ensures that the underlying transformers library operates in offline mode.
tokenizer.json, vocab.txt, or vocab.jsontransformers library--tokenizer parameter accepts both directory paths and direct file pathsIf you encounter errors about missing tokenizer files:
If the tokenizer fails to load:
tokenizer.json, config.json, etc.)