ifbench#
This page contains all evaluation tasks for the ifbench harness.
Task |
Description |
|---|---|
IFBench with vanilla settings |
|
IFBench - params aligned with Artificial Analysis Index v2 |
ifbench#
IFBench with vanilla settings
Harness: ifbench
Container:
nvcr.io/nvidia/eval-factory/ifbench:26.01
Container Digest:
sha256:e99059d2e334ef97826629a004c888f7daed1adb9d724ca73274e1b93c743ac1
Container Arch: multiarch
Task Type: ifbench
{% if target.api_endpoint.api_key_name is not none %}export OPENAI_API_KEY=${{target.api_endpoint.api_key_name}} && {% endif %} ifbench --model-url {{target.api_endpoint.url}} --model-name {{target.api_endpoint.model_id}} --results-dir {{config.output_dir}} --inference-params max_tokens={{config.params.max_new_tokens}},temperature={{config.params.temperature}},top_p={{config.params.top_p}} --parallelism {{config.params.parallelism}} --retries {{config.params.max_retries}} {% if config.params.limit_samples is not none %} --limit {{config.params.limit_samples}} {% endif %}
framework_name: ifbench
pkg_name: ifbench
config:
params:
max_new_tokens: 4096
max_retries: 5
parallelism: 8
temperature: 0.01
top_p: 0.95
extra: {}
supported_endpoint_types:
- chat
type: ifbench
target:
api_endpoint:
stream: false
ifbench_aa_v2#
IFBench - params aligned with Artificial Analysis Index v2
Harness: ifbench
Container:
nvcr.io/nvidia/eval-factory/ifbench:26.01
Container Digest:
sha256:e99059d2e334ef97826629a004c888f7daed1adb9d724ca73274e1b93c743ac1
Container Arch: multiarch
Task Type: ifbench_aa_v2
{% if target.api_endpoint.api_key_name is not none %}export OPENAI_API_KEY=${{target.api_endpoint.api_key_name}} && {% endif %} ifbench --model-url {{target.api_endpoint.url}} --model-name {{target.api_endpoint.model_id}} --results-dir {{config.output_dir}} --inference-params max_tokens={{config.params.max_new_tokens}},temperature={{config.params.temperature}},top_p={{config.params.top_p}} --parallelism {{config.params.parallelism}} --retries {{config.params.max_retries}} {% if config.params.limit_samples is not none %} --limit {{config.params.limit_samples}} {% endif %}
framework_name: ifbench
pkg_name: ifbench
config:
params:
max_new_tokens: 16384
max_retries: 30
parallelism: 8
temperature: 0.0
top_p: 0.95
extra: {}
supported_endpoint_types:
- chat
type: ifbench_aa_v2
target:
api_endpoint:
stream: false