ruler#

This page contains all evaluation tasks for the ruler harness.

Task

Description

ruler-128k-chat

RULER with context length of 128k (chat mode)

ruler-128k-completions

RULER with context length of 128k (completions mode)

ruler-16k-chat

RULER with context length of 16k (chat mode)

ruler-16k-completions

RULER with context length of 16k (completions mode)

ruler-1m-chat

RULER with context length of 1M (chat mode)

ruler-1m-completions

RULER with context length of 1M (completions mode)

ruler-256k-chat

RULER with context length of 256k (chat mode)

ruler-256k-completions

RULER with context length of 256k (completions mode)

ruler-32k-chat

RULER with context length of 32k (chat mode)

ruler-32k-completions

RULER with context length of 32k (completions mode)

ruler-4k-chat

RULER with context length of 4k (chat mode)

ruler-4k-completions

RULER with context length of 4k (completions mode)

ruler-512k-chat

RULER with context length of 512k (chat mode)

ruler-512k-completions

RULER with context length of 512k (completions mode)

ruler-64k-chat

RULER with context length of 64k (chat mode)

ruler-64k-completions

RULER with context length of 64k (completions mode)

ruler-8k-chat

RULER with context length of 8k (chat mode)

ruler-8k-completions

RULER with context length of 8k (completions mode)

ruler-chat

RULER (chat mode) without specified context length. A user must explicitly specify max_seq_length parameter.

ruler-completions

RULER (completions mode) without specified context length. A user must explicitly specify max_seq_length parameter.

ruler-128k-chat#

RULER with context length of 128k (chat mode)

Harness: ruler

Container:

nvcr.io/nvidia/eval-factory/long-context-eval:26.01

Container Digest:

sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689

Container Arch: multiarch

Task Type: ruler-128k-chat

python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
  params:
    parallelism: 1
    temperature: 0.0
    request_timeout: 300
    top_p: 0.0001
    extra:
      tokenizer: null
      tokenizer_backend: hf
      max_seq_length: 128000
      subtasks: all
  supported_endpoint_types:
  - chat
  type: ruler-128k-chat
target:
  api_endpoint: {}

ruler-128k-completions#

RULER with context length of 128k (completions mode)

Harness: ruler

Container:

nvcr.io/nvidia/eval-factory/long-context-eval:26.01

Container Digest:

sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689

Container Arch: multiarch

Task Type: ruler-128k-completions

python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
  params:
    parallelism: 1
    temperature: 0.0
    request_timeout: 300
    top_p: 0.0001
    extra:
      tokenizer: null
      tokenizer_backend: hf
      max_seq_length: 128000
      subtasks: all
  supported_endpoint_types:
  - completions
  type: ruler-128k-completions
target:
  api_endpoint: {}

ruler-16k-chat#

RULER with context length of 16k (chat mode)

Harness: ruler

Container:

nvcr.io/nvidia/eval-factory/long-context-eval:26.01

Container Digest:

sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689

Container Arch: multiarch

Task Type: ruler-16k-chat

python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
  params:
    parallelism: 1
    temperature: 0.0
    request_timeout: 300
    top_p: 0.0001
    extra:
      tokenizer: null
      tokenizer_backend: hf
      max_seq_length: 16000
      subtasks: all
  supported_endpoint_types:
  - chat
  type: ruler-16k-chat
target:
  api_endpoint: {}

ruler-16k-completions#

RULER with context length of 16k (completions mode)

Harness: ruler

Container:

nvcr.io/nvidia/eval-factory/long-context-eval:26.01

Container Digest:

sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689

Container Arch: multiarch

Task Type: ruler-16k-completions

python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
  params:
    parallelism: 1
    temperature: 0.0
    request_timeout: 300
    top_p: 0.0001
    extra:
      tokenizer: null
      tokenizer_backend: hf
      max_seq_length: 16000
      subtasks: all
  supported_endpoint_types:
  - completions
  type: ruler-16k-completions
target:
  api_endpoint: {}

ruler-1m-chat#

RULER with context length of 1M (chat mode)

Harness: ruler

Container:

nvcr.io/nvidia/eval-factory/long-context-eval:26.01

Container Digest:

sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689

Container Arch: multiarch

Task Type: ruler-1m-chat

python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
  params:
    parallelism: 1
    temperature: 0.0
    request_timeout: 300
    top_p: 0.0001
    extra:
      tokenizer: null
      tokenizer_backend: hf
      max_seq_length: 1000000
      subtasks: all
  supported_endpoint_types:
  - chat
  type: ruler-1m-chat
target:
  api_endpoint: {}

ruler-1m-completions#

RULER with context length of 1M (completions mode)

Harness: ruler

Container:

nvcr.io/nvidia/eval-factory/long-context-eval:26.01

Container Digest:

sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689

Container Arch: multiarch

Task Type: ruler-1m-completions

python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
  params:
    parallelism: 1
    temperature: 0.0
    request_timeout: 300
    top_p: 0.0001
    extra:
      tokenizer: null
      tokenizer_backend: hf
      max_seq_length: 1000000
      subtasks: all
  supported_endpoint_types:
  - completions
  type: ruler-1m-completions
target:
  api_endpoint: {}

ruler-256k-chat#

RULER with context length of 256k (chat mode)

Harness: ruler

Container:

nvcr.io/nvidia/eval-factory/long-context-eval:26.01

Container Digest:

sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689

Container Arch: multiarch

Task Type: ruler-256k-chat

python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
  params:
    parallelism: 1
    temperature: 0.0
    request_timeout: 300
    top_p: 0.0001
    extra:
      tokenizer: null
      tokenizer_backend: hf
      max_seq_length: 256000
      subtasks: all
  supported_endpoint_types:
  - chat
  type: ruler-256k-chat
target:
  api_endpoint: {}

ruler-256k-completions#

RULER with context length of 256k (completions mode)

Harness: ruler

Container:

nvcr.io/nvidia/eval-factory/long-context-eval:26.01

Container Digest:

sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689

Container Arch: multiarch

Task Type: ruler-256k-completions

python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
  params:
    parallelism: 1
    temperature: 0.0
    request_timeout: 300
    top_p: 0.0001
    extra:
      tokenizer: null
      tokenizer_backend: hf
      max_seq_length: 256000
      subtasks: all
  supported_endpoint_types:
  - completions
  type: ruler-256k-completions
target:
  api_endpoint: {}

ruler-32k-chat#

RULER with context length of 32k (chat mode)

Harness: ruler

Container:

nvcr.io/nvidia/eval-factory/long-context-eval:26.01

Container Digest:

sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689

Container Arch: multiarch

Task Type: ruler-32k-chat

python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
  params:
    parallelism: 1
    temperature: 0.0
    request_timeout: 300
    top_p: 0.0001
    extra:
      tokenizer: null
      tokenizer_backend: hf
      max_seq_length: 32000
      subtasks: all
  supported_endpoint_types:
  - chat
  type: ruler-32k-chat
target:
  api_endpoint: {}

ruler-32k-completions#

RULER with context length of 32k (completions mode)

Harness: ruler

Container:

nvcr.io/nvidia/eval-factory/long-context-eval:26.01

Container Digest:

sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689

Container Arch: multiarch

Task Type: ruler-32k-completions

python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
  params:
    parallelism: 1
    temperature: 0.0
    request_timeout: 300
    top_p: 0.0001
    extra:
      tokenizer: null
      tokenizer_backend: hf
      max_seq_length: 32000
      subtasks: all
  supported_endpoint_types:
  - completions
  type: ruler-32k-completions
target:
  api_endpoint: {}

ruler-4k-chat#

RULER with context length of 4k (chat mode)

Harness: ruler

Container:

nvcr.io/nvidia/eval-factory/long-context-eval:26.01

Container Digest:

sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689

Container Arch: multiarch

Task Type: ruler-4k-chat

python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
  params:
    parallelism: 1
    temperature: 0.0
    request_timeout: 300
    top_p: 0.0001
    extra:
      tokenizer: null
      tokenizer_backend: hf
      max_seq_length: 4000
      subtasks: all
  supported_endpoint_types:
  - chat
  type: ruler-4k-chat
target:
  api_endpoint: {}

ruler-4k-completions#

RULER with context length of 4k (completions mode)

Harness: ruler

Container:

nvcr.io/nvidia/eval-factory/long-context-eval:26.01

Container Digest:

sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689

Container Arch: multiarch

Task Type: ruler-4k-completions

python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
  params:
    parallelism: 1
    temperature: 0.0
    request_timeout: 300
    top_p: 0.0001
    extra:
      tokenizer: null
      tokenizer_backend: hf
      max_seq_length: 4000
      subtasks: all
  supported_endpoint_types:
  - completions
  type: ruler-4k-completions
target:
  api_endpoint: {}

ruler-512k-chat#

RULER with context length of 512k (chat mode)

Harness: ruler

Container:

nvcr.io/nvidia/eval-factory/long-context-eval:26.01

Container Digest:

sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689

Container Arch: multiarch

Task Type: ruler-512k-chat

python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
  params:
    parallelism: 1
    temperature: 0.0
    request_timeout: 300
    top_p: 0.0001
    extra:
      tokenizer: null
      tokenizer_backend: hf
      max_seq_length: 512000
      subtasks: all
  supported_endpoint_types:
  - chat
  type: ruler-512k-chat
target:
  api_endpoint: {}

ruler-512k-completions#

RULER with context length of 512k (completions mode)

Harness: ruler

Container:

nvcr.io/nvidia/eval-factory/long-context-eval:26.01

Container Digest:

sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689

Container Arch: multiarch

Task Type: ruler-512k-completions

python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
  params:
    parallelism: 1
    temperature: 0.0
    request_timeout: 300
    top_p: 0.0001
    extra:
      tokenizer: null
      tokenizer_backend: hf
      max_seq_length: 512000
      subtasks: all
  supported_endpoint_types:
  - completions
  type: ruler-512k-completions
target:
  api_endpoint: {}

ruler-64k-chat#

RULER with context length of 64k (chat mode)

Harness: ruler

Container:

nvcr.io/nvidia/eval-factory/long-context-eval:26.01

Container Digest:

sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689

Container Arch: multiarch

Task Type: ruler-64k-chat

python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
  params:
    parallelism: 1
    temperature: 0.0
    request_timeout: 300
    top_p: 0.0001
    extra:
      tokenizer: null
      tokenizer_backend: hf
      max_seq_length: 64000
      subtasks: all
  supported_endpoint_types:
  - chat
  type: ruler-64k-chat
target:
  api_endpoint: {}

ruler-64k-completions#

RULER with context length of 64k (completions mode)

Harness: ruler

Container:

nvcr.io/nvidia/eval-factory/long-context-eval:26.01

Container Digest:

sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689

Container Arch: multiarch

Task Type: ruler-64k-completions

python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
  params:
    parallelism: 1
    temperature: 0.0
    request_timeout: 300
    top_p: 0.0001
    extra:
      tokenizer: null
      tokenizer_backend: hf
      max_seq_length: 64000
      subtasks: all
  supported_endpoint_types:
  - completions
  type: ruler-64k-completions
target:
  api_endpoint: {}

ruler-8k-chat#

RULER with context length of 8k (chat mode)

Harness: ruler

Container:

nvcr.io/nvidia/eval-factory/long-context-eval:26.01

Container Digest:

sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689

Container Arch: multiarch

Task Type: ruler-8k-chat

python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
  params:
    parallelism: 1
    temperature: 0.0
    request_timeout: 300
    top_p: 0.0001
    extra:
      tokenizer: null
      tokenizer_backend: hf
      max_seq_length: 8000
      subtasks: all
  supported_endpoint_types:
  - chat
  type: ruler-8k-chat
target:
  api_endpoint: {}

ruler-8k-completions#

RULER with context length of 8k (completions mode)

Harness: ruler

Container:

nvcr.io/nvidia/eval-factory/long-context-eval:26.01

Container Digest:

sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689

Container Arch: multiarch

Task Type: ruler-8k-completions

python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
  params:
    parallelism: 1
    temperature: 0.0
    request_timeout: 300
    top_p: 0.0001
    extra:
      tokenizer: null
      tokenizer_backend: hf
      max_seq_length: 8000
      subtasks: all
  supported_endpoint_types:
  - completions
  type: ruler-8k-completions
target:
  api_endpoint: {}

ruler-chat#

RULER (chat mode) without specified context length. A user must explicitly specify max_seq_length parameter.

Harness: ruler

Container:

nvcr.io/nvidia/eval-factory/long-context-eval:26.01

Container Digest:

sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689

Container Arch: multiarch

Task Type: ruler-chat

python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
  params:
    parallelism: 1
    temperature: 0.0
    request_timeout: 300
    top_p: 0.0001
    extra:
      tokenizer: null
      tokenizer_backend: hf
      max_seq_length: null
      subtasks: all
  supported_endpoint_types:
  - chat
  type: ruler-chat
target:
  api_endpoint: {}

ruler-completions#

RULER (completions mode) without specified context length. A user must explicitly specify max_seq_length parameter.

Harness: ruler

Container:

nvcr.io/nvidia/eval-factory/long-context-eval:26.01

Container Digest:

sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689

Container Arch: multiarch

Task Type: ruler-completions

python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
  params:
    parallelism: 1
    temperature: 0.0
    request_timeout: 300
    top_p: 0.0001
    extra:
      tokenizer: null
      tokenizer_backend: hf
      max_seq_length: null
      subtasks: all
  supported_endpoint_types:
  - completions
  type: ruler-completions
target:
  api_endpoint: {}