Tasks Catalog#

This page catalogs task identifiers used by eval/model_eval. NeMo Evaluator Launcher owns the authoritative task list. Use this page as a quick map, then verify exact names with the installed launcher.

nemo-evaluator-launcher ls tasks
nemo-evaluator-launcher ls task <task-id>

Naming Rule#

Use the exact task id listed by NeMo Evaluator Launcher. Do not prepend a harness name unless the launcher lists that exact dotted id.

Repository Starting Points#

Config	Task entries	When to use
`tiny_chat.yaml`	`mmlu_instruct`	Hosted chat smoke test.
`default.yaml`	`adlr_mmlu`, `hellaswag`	Launcher-managed Megatron checkpoint evaluation.

Chat And Instruction Tasks#

These tasks use a chat endpoint. The hosted smoke-test config uses this family.

Identifier	Notes
`mmlu_instruct`	Chat/instruction smoke task used by `tiny_chat.yaml`.
`adlr_mmlu`	Configured by `default.yaml`; verify endpoint requirements in the installed launcher.

Log-Probability Tasks#

These tasks generally need a completions endpoint with logprobs support and a tokenizer that matches the served model.

Identifier	Notes
`hellaswag`	Configured by `default.yaml`; requires endpoint/tokenizer compatibility for meaningful scores.

Configure tokenizer values under:

evaluation.nemo_evaluator_config.config.params.extra.tokenizer
evaluation.nemo_evaluator_config.config.params.extra.tokenizer_backend

Choosing Tasks#

Ask three questions before changing the task list.

Does the installed launcher list the task id exactly?
Does the endpoint type match the task family?
Is this a smoke test or a production comparison?

For production comparisons, keep the same task list, endpoint type, tokenizer, and generation parameters across baseline and post-training runs.