ruler#
This page contains all evaluation tasks for the ruler harness.
Task |
Description |
|---|---|
RULER with context length of 128k (chat mode) |
|
RULER with context length of 128k (completions mode) |
|
RULER with context length of 16k (chat mode) |
|
RULER with context length of 16k (completions mode) |
|
RULER with context length of 1M (chat mode) |
|
RULER with context length of 1M (completions mode) |
|
RULER with context length of 256k (chat mode) |
|
RULER with context length of 256k (completions mode) |
|
RULER with context length of 32k (chat mode) |
|
RULER with context length of 32k (completions mode) |
|
RULER with context length of 4k (chat mode) |
|
RULER with context length of 4k (completions mode) |
|
RULER with context length of 512k (chat mode) |
|
RULER with context length of 512k (completions mode) |
|
RULER with context length of 64k (chat mode) |
|
RULER with context length of 64k (completions mode) |
|
RULER with context length of 8k (chat mode) |
|
RULER with context length of 8k (completions mode) |
|
RULER (chat mode) without specified context length. A user must explicitly specify |
|
RULER (completions mode) without specified context length. A user must explicitly specify |
ruler-128k-chat#
RULER with context length of 128k (chat mode)
Harness: ruler
Container:
nvcr.io/nvidia/eval-factory/long-context-eval:26.01
Container Digest:
sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689
Container Arch: multiarch
Task Type: ruler-128k-chat
python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
params:
parallelism: 1
temperature: 0.0
request_timeout: 300
top_p: 0.0001
extra:
tokenizer: null
tokenizer_backend: hf
max_seq_length: 128000
subtasks: all
supported_endpoint_types:
- chat
type: ruler-128k-chat
target:
api_endpoint: {}
ruler-128k-completions#
RULER with context length of 128k (completions mode)
Harness: ruler
Container:
nvcr.io/nvidia/eval-factory/long-context-eval:26.01
Container Digest:
sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689
Container Arch: multiarch
Task Type: ruler-128k-completions
python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
params:
parallelism: 1
temperature: 0.0
request_timeout: 300
top_p: 0.0001
extra:
tokenizer: null
tokenizer_backend: hf
max_seq_length: 128000
subtasks: all
supported_endpoint_types:
- completions
type: ruler-128k-completions
target:
api_endpoint: {}
ruler-16k-chat#
RULER with context length of 16k (chat mode)
Harness: ruler
Container:
nvcr.io/nvidia/eval-factory/long-context-eval:26.01
Container Digest:
sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689
Container Arch: multiarch
Task Type: ruler-16k-chat
python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
params:
parallelism: 1
temperature: 0.0
request_timeout: 300
top_p: 0.0001
extra:
tokenizer: null
tokenizer_backend: hf
max_seq_length: 16000
subtasks: all
supported_endpoint_types:
- chat
type: ruler-16k-chat
target:
api_endpoint: {}
ruler-16k-completions#
RULER with context length of 16k (completions mode)
Harness: ruler
Container:
nvcr.io/nvidia/eval-factory/long-context-eval:26.01
Container Digest:
sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689
Container Arch: multiarch
Task Type: ruler-16k-completions
python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
params:
parallelism: 1
temperature: 0.0
request_timeout: 300
top_p: 0.0001
extra:
tokenizer: null
tokenizer_backend: hf
max_seq_length: 16000
subtasks: all
supported_endpoint_types:
- completions
type: ruler-16k-completions
target:
api_endpoint: {}
ruler-1m-chat#
RULER with context length of 1M (chat mode)
Harness: ruler
Container:
nvcr.io/nvidia/eval-factory/long-context-eval:26.01
Container Digest:
sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689
Container Arch: multiarch
Task Type: ruler-1m-chat
python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
params:
parallelism: 1
temperature: 0.0
request_timeout: 300
top_p: 0.0001
extra:
tokenizer: null
tokenizer_backend: hf
max_seq_length: 1000000
subtasks: all
supported_endpoint_types:
- chat
type: ruler-1m-chat
target:
api_endpoint: {}
ruler-1m-completions#
RULER with context length of 1M (completions mode)
Harness: ruler
Container:
nvcr.io/nvidia/eval-factory/long-context-eval:26.01
Container Digest:
sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689
Container Arch: multiarch
Task Type: ruler-1m-completions
python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
params:
parallelism: 1
temperature: 0.0
request_timeout: 300
top_p: 0.0001
extra:
tokenizer: null
tokenizer_backend: hf
max_seq_length: 1000000
subtasks: all
supported_endpoint_types:
- completions
type: ruler-1m-completions
target:
api_endpoint: {}
ruler-256k-chat#
RULER with context length of 256k (chat mode)
Harness: ruler
Container:
nvcr.io/nvidia/eval-factory/long-context-eval:26.01
Container Digest:
sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689
Container Arch: multiarch
Task Type: ruler-256k-chat
python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
params:
parallelism: 1
temperature: 0.0
request_timeout: 300
top_p: 0.0001
extra:
tokenizer: null
tokenizer_backend: hf
max_seq_length: 256000
subtasks: all
supported_endpoint_types:
- chat
type: ruler-256k-chat
target:
api_endpoint: {}
ruler-256k-completions#
RULER with context length of 256k (completions mode)
Harness: ruler
Container:
nvcr.io/nvidia/eval-factory/long-context-eval:26.01
Container Digest:
sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689
Container Arch: multiarch
Task Type: ruler-256k-completions
python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
params:
parallelism: 1
temperature: 0.0
request_timeout: 300
top_p: 0.0001
extra:
tokenizer: null
tokenizer_backend: hf
max_seq_length: 256000
subtasks: all
supported_endpoint_types:
- completions
type: ruler-256k-completions
target:
api_endpoint: {}
ruler-32k-chat#
RULER with context length of 32k (chat mode)
Harness: ruler
Container:
nvcr.io/nvidia/eval-factory/long-context-eval:26.01
Container Digest:
sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689
Container Arch: multiarch
Task Type: ruler-32k-chat
python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
params:
parallelism: 1
temperature: 0.0
request_timeout: 300
top_p: 0.0001
extra:
tokenizer: null
tokenizer_backend: hf
max_seq_length: 32000
subtasks: all
supported_endpoint_types:
- chat
type: ruler-32k-chat
target:
api_endpoint: {}
ruler-32k-completions#
RULER with context length of 32k (completions mode)
Harness: ruler
Container:
nvcr.io/nvidia/eval-factory/long-context-eval:26.01
Container Digest:
sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689
Container Arch: multiarch
Task Type: ruler-32k-completions
python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
params:
parallelism: 1
temperature: 0.0
request_timeout: 300
top_p: 0.0001
extra:
tokenizer: null
tokenizer_backend: hf
max_seq_length: 32000
subtasks: all
supported_endpoint_types:
- completions
type: ruler-32k-completions
target:
api_endpoint: {}
ruler-4k-chat#
RULER with context length of 4k (chat mode)
Harness: ruler
Container:
nvcr.io/nvidia/eval-factory/long-context-eval:26.01
Container Digest:
sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689
Container Arch: multiarch
Task Type: ruler-4k-chat
python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
params:
parallelism: 1
temperature: 0.0
request_timeout: 300
top_p: 0.0001
extra:
tokenizer: null
tokenizer_backend: hf
max_seq_length: 4000
subtasks: all
supported_endpoint_types:
- chat
type: ruler-4k-chat
target:
api_endpoint: {}
ruler-4k-completions#
RULER with context length of 4k (completions mode)
Harness: ruler
Container:
nvcr.io/nvidia/eval-factory/long-context-eval:26.01
Container Digest:
sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689
Container Arch: multiarch
Task Type: ruler-4k-completions
python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
params:
parallelism: 1
temperature: 0.0
request_timeout: 300
top_p: 0.0001
extra:
tokenizer: null
tokenizer_backend: hf
max_seq_length: 4000
subtasks: all
supported_endpoint_types:
- completions
type: ruler-4k-completions
target:
api_endpoint: {}
ruler-512k-chat#
RULER with context length of 512k (chat mode)
Harness: ruler
Container:
nvcr.io/nvidia/eval-factory/long-context-eval:26.01
Container Digest:
sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689
Container Arch: multiarch
Task Type: ruler-512k-chat
python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
params:
parallelism: 1
temperature: 0.0
request_timeout: 300
top_p: 0.0001
extra:
tokenizer: null
tokenizer_backend: hf
max_seq_length: 512000
subtasks: all
supported_endpoint_types:
- chat
type: ruler-512k-chat
target:
api_endpoint: {}
ruler-512k-completions#
RULER with context length of 512k (completions mode)
Harness: ruler
Container:
nvcr.io/nvidia/eval-factory/long-context-eval:26.01
Container Digest:
sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689
Container Arch: multiarch
Task Type: ruler-512k-completions
python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
params:
parallelism: 1
temperature: 0.0
request_timeout: 300
top_p: 0.0001
extra:
tokenizer: null
tokenizer_backend: hf
max_seq_length: 512000
subtasks: all
supported_endpoint_types:
- completions
type: ruler-512k-completions
target:
api_endpoint: {}
ruler-64k-chat#
RULER with context length of 64k (chat mode)
Harness: ruler
Container:
nvcr.io/nvidia/eval-factory/long-context-eval:26.01
Container Digest:
sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689
Container Arch: multiarch
Task Type: ruler-64k-chat
python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
params:
parallelism: 1
temperature: 0.0
request_timeout: 300
top_p: 0.0001
extra:
tokenizer: null
tokenizer_backend: hf
max_seq_length: 64000
subtasks: all
supported_endpoint_types:
- chat
type: ruler-64k-chat
target:
api_endpoint: {}
ruler-64k-completions#
RULER with context length of 64k (completions mode)
Harness: ruler
Container:
nvcr.io/nvidia/eval-factory/long-context-eval:26.01
Container Digest:
sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689
Container Arch: multiarch
Task Type: ruler-64k-completions
python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
params:
parallelism: 1
temperature: 0.0
request_timeout: 300
top_p: 0.0001
extra:
tokenizer: null
tokenizer_backend: hf
max_seq_length: 64000
subtasks: all
supported_endpoint_types:
- completions
type: ruler-64k-completions
target:
api_endpoint: {}
ruler-8k-chat#
RULER with context length of 8k (chat mode)
Harness: ruler
Container:
nvcr.io/nvidia/eval-factory/long-context-eval:26.01
Container Digest:
sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689
Container Arch: multiarch
Task Type: ruler-8k-chat
python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
params:
parallelism: 1
temperature: 0.0
request_timeout: 300
top_p: 0.0001
extra:
tokenizer: null
tokenizer_backend: hf
max_seq_length: 8000
subtasks: all
supported_endpoint_types:
- chat
type: ruler-8k-chat
target:
api_endpoint: {}
ruler-8k-completions#
RULER with context length of 8k (completions mode)
Harness: ruler
Container:
nvcr.io/nvidia/eval-factory/long-context-eval:26.01
Container Digest:
sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689
Container Arch: multiarch
Task Type: ruler-8k-completions
python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
params:
parallelism: 1
temperature: 0.0
request_timeout: 300
top_p: 0.0001
extra:
tokenizer: null
tokenizer_backend: hf
max_seq_length: 8000
subtasks: all
supported_endpoint_types:
- completions
type: ruler-8k-completions
target:
api_endpoint: {}
ruler-chat#
RULER (chat mode) without specified context length. A user must explicitly specify max_seq_length parameter.
Harness: ruler
Container:
nvcr.io/nvidia/eval-factory/long-context-eval:26.01
Container Digest:
sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689
Container Arch: multiarch
Task Type: ruler-chat
python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
params:
parallelism: 1
temperature: 0.0
request_timeout: 300
top_p: 0.0001
extra:
tokenizer: null
tokenizer_backend: hf
max_seq_length: null
subtasks: all
supported_endpoint_types:
- chat
type: ruler-chat
target:
api_endpoint: {}
ruler-completions#
RULER (completions mode) without specified context length. A user must explicitly specify max_seq_length parameter.
Harness: ruler
Container:
nvcr.io/nvidia/eval-factory/long-context-eval:26.01
Container Digest:
sha256:461a74e48403c66058797cbfb6f42b1cc769b33f92dbd0503706586b2eb84689
Container Arch: multiarch
Task Type: ruler-completions
python -c "import nltk;nltk.download('punkt_tab');nltk.download('punkt')" && {% if target.api_endpoint.api_key_name is not none %}export API_KEY=${{target.api_endpoint.api_key_name}} &&{% endif %} long_context_eval --url {{target.api_endpoint.url}} --tasks "{{config.params.extra.subtasks}}" --result_dir {{config.output_dir}} --model {{target.api_endpoint.model_id}} --mode {% if target.api_endpoint.type == "completions" %}completion{% elif target.api_endpoint.type == "chat" %}chat{% endif %} --tokenizer_path "{{config.params.extra.tokenizer}}" --tokenizer_type "{{config.params.extra.tokenizer_backend}}" --temperature {{config.params.temperature}} --top_p {{config.params.top_p}} {% if config.params.limit_samples is not none %}--num_samples {{config.params.limit_samples}}{% endif %} {% if config.params.extra.max_seq_length is defined %}--max_seq_length {{config.params.extra.max_seq_length}}{% endif %} --timeout {{config.params.request_timeout}} --threads {{config.params.parallelism}} {% if config.params.max_new_tokens is not none %}--tokens_to_generate {{config.params.max_new_tokens}}{% endif %}
framework_name: ruler
pkg_name: long_context_eval
config:
params:
parallelism: 1
temperature: 0.0
request_timeout: 300
top_p: 0.0001
extra:
tokenizer: null
tokenizer_backend: hf
max_seq_length: null
subtasks: all
supported_endpoint_types:
- completions
type: ruler-completions
target:
api_endpoint: {}