Troubleshooting#
This page maps common eval/model_eval failures to the config fields that usually need correction.
Nemotron builds a launcher config and calls NeMo Evaluator Launcher; task execution, endpoint checks, and result writing are owned by the launcher.
Evaluator Extra Missing#
Symptom:
Error: nemo-evaluator-launcher is required for evaluation
Install with: uv sync --extra evaluator
Recovery:
uv sync --extra evaluator
Then rerun the same nemotron steps run eval/model_eval command with uv run --no-sync.
Hosted Endpoint Fails#
Most hosted failures come from one of these fields:
Field |
What To Check |
|---|---|
|
Full endpoint URL, including |
|
Exact model id returned by the endpoint’s models API or UI. |
|
Environment variable name, not the secret value. |
|
|
For hosted smoke tests, start with tiny_chat.yaml and target.api_endpoint.type=chat.
Wrong Task For Endpoint Type#
Chat tasks need a chat endpoint. Log-probability tasks generally need a completions endpoint with logprobs support and a tokenizer.
If the launcher fails after endpoint setup, check:
tasks
target.api_endpoint.type
evaluation.nemo_evaluator_config.config.params.extra.tokenizer
Use exact task IDs from:
nemo-evaluator-launcher ls tasks
Bad Checkpoint Path#
When using default.yaml, point deployment.checkpoint_path at a concrete Megatron Bridge iter_* directory.
Do not point it only at the parent training output directory.
deployment.checkpoint_path=/path/to/run/iter_0001000
For log-probability tasks, also verify:
evaluation.nemo_evaluator_config.config.params.extra.tokenizer=/path/to/run/iter_0001000/tokenizer
Launcher Job State#
The step prints launcher follow-up commands when the launcher returns an invocation id.
status_command: nemo-evaluator-launcher status <id>
logs_command: nemo-evaluator-launcher logs <id>
Run those commands before changing config. The launcher logs usually distinguish endpoint/authentication failures from task-schema failures.