ifbench#

This page contains all evaluation tasks for the ifbench harness.

Task	Description
ifbench	IFBench with vanilla settings
ifbench_aa_v2	IFBench - params aligned with Artificial Analysis Index v2

ifbench#

IFBench with vanilla settings

Container

Harness: ifbench

Container:

nvcr.io/nvidia/eval-factory/ifbench:26.01

Container Digest:

sha256:e99059d2e334ef97826629a004c888f7daed1adb9d724ca73274e1b93c743ac1

Container Arch: multiarch

Task Type: ifbench

Command

{% if target.api_endpoint.api_key_name is not none %}export OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}  && {% endif %} ifbench --model-url {{target.api_endpoint.url}} --model-name {{target.api_endpoint.model_id}}  --results-dir {{config.output_dir}} --inference-params max_tokens={{config.params.max_new_tokens}},temperature={{config.params.temperature}},top_p={{config.params.top_p}} --parallelism {{config.params.parallelism}} --retries {{config.params.max_retries}} {% if config.params.limit_samples is not none %} --limit {{config.params.limit_samples}} {% endif %}

Defaults

framework_name: ifbench
pkg_name: ifbench
config:
  params:
    max_new_tokens: 4096
    max_retries: 5
    parallelism: 8
    temperature: 0.01
    top_p: 0.95
    extra: {}
  supported_endpoint_types:
  - chat
  type: ifbench
target:
  api_endpoint:
    stream: false

ifbench_aa_v2#

IFBench - params aligned with Artificial Analysis Index v2