hle#

This page contains all evaluation tasks for the hle harness.

Task	Description
hle	Text-only questions from Humanity’s Last Exam
hle_aa_v2	Text-only questions from Humanity’s Last Exam and params aligned with Artificial Analysis Index v2

hle#

Text-only questions from Humanity’s Last Exam

Container

Harness: hle

Container:

nvcr.io/nvidia/eval-factory/hle:26.01

Container Digest:

sha256:59fa69e20bbaaa251effa5f9d440d60920bc601cfb26f9e03866f1b6aff6dc33

Container Arch: multiarch

Task Type: hle

Command

hle_eval --dataset=cais/hle --model_name={{target.api_endpoint.model_id}} --model_url={{target.api_endpoint.url}}  --temperature={{config.params.temperature}} --top_p={{config.params.top_p}} --timeout={{config.params.request_timeout}}  {% if config.params.limit_samples is not none %}--limit {{config.params.limit_samples}}{% endif %} --output_dir={{config.output_dir}}  {% if target.api_endpoint.api_key_name is not none %}--api_key_name={{target.api_endpoint.api_key_name}}{% endif %} --max_retries={{config.params.max_retries}} --num_workers={{config.params.parallelism}}  --max_new_tokens={{config.params.max_new_tokens}} --text_only --generate --judge

Defaults

framework_name: hle
pkg_name: hle
config:
  params:
    max_new_tokens: 8192
    max_retries: 30
    parallelism: 10
    temperature: 0.0
    request_timeout: 600
    top_p: 1.0
    extra: {}
  supported_endpoint_types:
  - chat
  type: hle
target:
  api_endpoint: {}

hle_aa_v2#

Text-only questions from Humanity’s Last Exam and params aligned with Artificial Analysis Index v2