hle#
This page contains all evaluation tasks for the hle harness.
Task |
Description |
|---|---|
Text-only questions from Humanity’s Last Exam |
|
Text-only questions from Humanity’s Last Exam and params aligned with Artificial Analysis Index v2 |
hle#
Text-only questions from Humanity’s Last Exam
Harness: hle
Container:
nvcr.io/nvidia/eval-factory/hle:26.01
Container Digest:
sha256:59fa69e20bbaaa251effa5f9d440d60920bc601cfb26f9e03866f1b6aff6dc33
Container Arch: multiarch
Task Type: hle
hle_eval --dataset=cais/hle --model_name={{target.api_endpoint.model_id}} --model_url={{target.api_endpoint.url}} --temperature={{config.params.temperature}} --top_p={{config.params.top_p}} --timeout={{config.params.request_timeout}} {% if config.params.limit_samples is not none %}--limit {{config.params.limit_samples}}{% endif %} --output_dir={{config.output_dir}} {% if target.api_endpoint.api_key_name is not none %}--api_key_name={{target.api_endpoint.api_key_name}}{% endif %} --max_retries={{config.params.max_retries}} --num_workers={{config.params.parallelism}} --max_new_tokens={{config.params.max_new_tokens}} --text_only --generate --judge
framework_name: hle
pkg_name: hle
config:
params:
max_new_tokens: 8192
max_retries: 30
parallelism: 10
temperature: 0.0
request_timeout: 600
top_p: 1.0
extra: {}
supported_endpoint_types:
- chat
type: hle
target:
api_endpoint: {}
hle_aa_v2#
Text-only questions from Humanity’s Last Exam and params aligned with Artificial Analysis Index v2
Harness: hle
Container:
nvcr.io/nvidia/eval-factory/hle:26.01
Container Digest:
sha256:59fa69e20bbaaa251effa5f9d440d60920bc601cfb26f9e03866f1b6aff6dc33
Container Arch: multiarch
Task Type: hle_aa_v2
hle_eval --dataset=cais/hle --model_name={{target.api_endpoint.model_id}} --model_url={{target.api_endpoint.url}} --temperature={{config.params.temperature}} --top_p={{config.params.top_p}} --timeout={{config.params.request_timeout}} {% if config.params.limit_samples is not none %}--limit {{config.params.limit_samples}}{% endif %} --output_dir={{config.output_dir}} {% if target.api_endpoint.api_key_name is not none %}--api_key_name={{target.api_endpoint.api_key_name}}{% endif %} --max_retries={{config.params.max_retries}} --num_workers={{config.params.parallelism}} --max_new_tokens={{config.params.max_new_tokens}} --text_only --generate --judge
framework_name: hle
pkg_name: hle
config:
params:
max_new_tokens: 16384
max_retries: 30
parallelism: 10
temperature: 0.0
request_timeout: 600
top_p: 1.0
extra: {}
supported_endpoint_types:
- chat
type: hle_aa_v2
target:
api_endpoint: {}