Tasks Catalog#
This page catalogs task identifiers used by eval/model_eval.
NeMo Evaluator Launcher owns the authoritative task list.
Use this page as a quick map, then verify exact names with the installed launcher.
nemo-evaluator-launcher ls tasks
nemo-evaluator-launcher ls task <task-id>
Naming Rule#
Use the exact task id listed by NeMo Evaluator Launcher. Do not prepend a harness name unless the launcher lists that exact dotted id.
Repository Starting Points#
Config |
Task entries |
When to use |
|---|---|---|
|
|
Hosted chat smoke test. |
|
|
Launcher-managed Megatron checkpoint evaluation. |
Chat And Instruction Tasks#
These tasks use a chat endpoint. The hosted smoke-test config uses this family.
Identifier |
Notes |
|---|---|
|
Chat/instruction smoke task used by |
|
Configured by |
Log-Probability Tasks#
These tasks generally need a completions endpoint with logprobs support and a tokenizer that matches the served model.
Identifier |
Notes |
|---|---|
|
Configured by |
Configure tokenizer values under:
evaluation.nemo_evaluator_config.config.params.extra.tokenizer
evaluation.nemo_evaluator_config.config.params.extra.tokenizer_backend
Choosing Tasks#
Ask three questions before changing the task list.
Does the installed launcher list the task id exactly?
Does the endpoint type match the task family?
Is this a smoke test or a production comparison?
For production comparisons, keep the same task list, endpoint type, tokenizer, and generation parameters across baseline and post-training runs.