Model Efficiency#
Containers specialized in evaluating Large Language Model efficiency.
GenAIPerf Container#
NGC Catalog: genai-perf
Container for assessing the speed of processing requests by the server.
Use Cases:
Analysis time to first token (TTF) and inter-token latency (ITL)
Assessment of server efficiency under load
Summarization scenario: long input, short output
Generation scenatio: short input, long output
Pull Command:
docker pull nvcr.io/nvidia/eval-factory/genai-perf:25.09.1
Default Parameters:
Parameter |
Value |
|---|---|
|
|
Benchmark-specific parameters (passed via extra field):
Parameter |
Description |
|---|---|
|
HuggingFace tokenizer to use for calculating the number of tokens. Requied parameter (default: |
|
Whether to run warmup (default: |
|
Input sequence length (default: task-specific, see below) |
|
Output sequence length (default: task-specific, see below) |
Supported Benchmarks:
genai_perf_summarization- Speed analysis withisl: 5000andosl: 500.genai_perf_generation- Speed analysis withisl: 500andosl: 5000.