AIPerf can benchmark Large Language Models (LLMs) served through the
Hugging Face Text Generation Inference (TGI)
generate API.
TGI exposes two standard HTTP endpoints for text generation:
To launch a Hugging Face TGI server, use the official ghcr.io image:
You can benchmark TGI models in either non-streaming or streaming, and with either synthetic inputs or a custom input file.
/generate)Sample Output (Successful Run):
You can also provide your own text prompts using the —input-file option. The file should be in JSONL format and contain text entries.
Then run:
/generate_stream)When the --streaming flag is enabled, AIPerf automatically sends requests to the /generate_stream endpoint of the TGI server.
Sample Output (Successful Run):
Create your own prompt file in JSONL format:
Then run: