genai_perf_eval#
This page contains all evaluation tasks for the genai_perf_eval harness.
Task |
Description |
|---|---|
GenAI Perf speed evaluation for chat endpoint, generation task - short input, long output |
|
GenAI Perf speed evaluation for completions endpoint, generation task - short input, long output |
|
GenAI Perf speed evaluation for chat endpoint, summarization task - long input, short output |
|
GenAI Perf speed evaluation for completions endpoint, summarization task - long input, short output |
genai_perf_generation#
GenAI Perf speed evaluation for chat endpoint, generation task - short input, long output
Harness: genai_perf_eval
Container:
nvcr.io/nvidia/eval-factory/genai-perf:26.01
Container Digest:
sha256:ab3f8b34a6cb63f7e48e8847fb069be71a3b73eb4f924bcf274cb02c6cc975b6
Container Arch: amd
Task Type: genai_perf_generation
genai_perf_eval --model_id {{target.api_endpoint.model_id}} --url {{target.api_endpoint.url}} {% if target.api_endpoint.api_key_name is not none %}--api-key {{target.api_endpoint.api_key_name}} {% endif %} --concurrencies {{config.params.parallelism}} --isl {{config.params.extra.isl}} --osl {{config.params.extra.osl}} --tokenizer {{config.params.extra.tokenizer}} --endpoint-type {{target.api_endpoint.type}} --artifact-dir {{config.output_dir}} {% if target.api_endpoint.stream %}--streaming {% endif %}{% if config.params.extra.warmup %}--warmup{% endif %}
framework_name: genai_perf_eval
pkg_name: genai_perf
config:
params:
parallelism: 1
extra:
tokenizer: null
warmup: true
isl: 500
osl: 5000
supported_endpoint_types:
- chat
type: genai_perf_generation
target:
api_endpoint: {}
genai_perf_generation_completions#
GenAI Perf speed evaluation for completions endpoint, generation task - short input, long output
Harness: genai_perf_eval
Container:
nvcr.io/nvidia/eval-factory/genai-perf:26.01
Container Digest:
sha256:ab3f8b34a6cb63f7e48e8847fb069be71a3b73eb4f924bcf274cb02c6cc975b6
Container Arch: amd
Task Type: genai_perf_generation_completions
genai_perf_eval --model_id {{target.api_endpoint.model_id}} --url {{target.api_endpoint.url}} {% if target.api_endpoint.api_key_name is not none %}--api-key {{target.api_endpoint.api_key_name}} {% endif %} --concurrencies {{config.params.parallelism}} --isl {{config.params.extra.isl}} --osl {{config.params.extra.osl}} --tokenizer {{config.params.extra.tokenizer}} --endpoint-type {{target.api_endpoint.type}} --artifact-dir {{config.output_dir}} {% if target.api_endpoint.stream %}--streaming {% endif %}{% if config.params.extra.warmup %}--warmup{% endif %}
framework_name: genai_perf_eval
pkg_name: genai_perf
config:
params:
parallelism: 1
task: genai_perf_generation
extra:
tokenizer: null
warmup: true
isl: 500
osl: 5000
supported_endpoint_types:
- completions
type: genai_perf_generation_completions
target:
api_endpoint: {}
genai_perf_summarization#
GenAI Perf speed evaluation for chat endpoint, summarization task - long input, short output
Harness: genai_perf_eval
Container:
nvcr.io/nvidia/eval-factory/genai-perf:26.01
Container Digest:
sha256:ab3f8b34a6cb63f7e48e8847fb069be71a3b73eb4f924bcf274cb02c6cc975b6
Container Arch: amd
Task Type: genai_perf_summarization
genai_perf_eval --model_id {{target.api_endpoint.model_id}} --url {{target.api_endpoint.url}} {% if target.api_endpoint.api_key_name is not none %}--api-key {{target.api_endpoint.api_key_name}} {% endif %} --concurrencies {{config.params.parallelism}} --isl {{config.params.extra.isl}} --osl {{config.params.extra.osl}} --tokenizer {{config.params.extra.tokenizer}} --endpoint-type {{target.api_endpoint.type}} --artifact-dir {{config.output_dir}} {% if target.api_endpoint.stream %}--streaming {% endif %}{% if config.params.extra.warmup %}--warmup{% endif %}
framework_name: genai_perf_eval
pkg_name: genai_perf
config:
params:
parallelism: 1
extra:
tokenizer: null
warmup: true
isl: 5000
osl: 500
supported_endpoint_types:
- chat
type: genai_perf_summarization
target:
api_endpoint: {}
genai_perf_summarization_completions#
GenAI Perf speed evaluation for completions endpoint, summarization task - long input, short output
Harness: genai_perf_eval
Container:
nvcr.io/nvidia/eval-factory/genai-perf:26.01
Container Digest:
sha256:ab3f8b34a6cb63f7e48e8847fb069be71a3b73eb4f924bcf274cb02c6cc975b6
Container Arch: amd
Task Type: genai_perf_summarization_completions
genai_perf_eval --model_id {{target.api_endpoint.model_id}} --url {{target.api_endpoint.url}} {% if target.api_endpoint.api_key_name is not none %}--api-key {{target.api_endpoint.api_key_name}} {% endif %} --concurrencies {{config.params.parallelism}} --isl {{config.params.extra.isl}} --osl {{config.params.extra.osl}} --tokenizer {{config.params.extra.tokenizer}} --endpoint-type {{target.api_endpoint.type}} --artifact-dir {{config.output_dir}} {% if target.api_endpoint.stream %}--streaming {% endif %}{% if config.params.extra.warmup %}--warmup{% endif %}
framework_name: genai_perf_eval
pkg_name: genai_perf
config:
params:
parallelism: 1
task: genai_perf_summarization
extra:
tokenizer: null
warmup: true
isl: 5000
osl: 500
supported_endpoint_types:
- completions
type: genai_perf_summarization_completions
target:
api_endpoint: {}