CLI Reference#
This page documents the CLI surface for nemotron steps run eval/model_eval.
The flags are shared by every Nemotron step.
The override examples are specific to the eval/model_eval YAML schema.
Syntax#
uv run nemotron steps run eval/model_eval [FLAGS] [HYDRA_OVERRIDES...]
Run the command from the repository root after uv sync --extra evaluator.
Pass the configuration name with -c, per-step overrides as key=value dotlists, and optional execution flags.
Flags#
Flag |
Long form |
Purpose |
|---|---|---|
|
|
Config name inside |
|
|
Attached execution by using an environment profile defined in |
|
|
Detached execution by using an environment profile defined in |
|
|
Compile the Nemotron job config and exit without dispatching. |
|
Force re-squash of the container image when the selected backend builds one. |
Invoking the command without -c resolves the runspec default, default.yaml.
Common Overrides#
Override |
Purpose |
|---|---|
|
Base output directory. The runtime also writes this into |
|
Pass dry-run mode to NeMo Evaluator Launcher. This is different from CLI |
|
Optional subset of configured task names passed to NeMo Evaluator Launcher. |
|
OpenAI-compatible endpoint URL for hosted evaluation when |
|
Exact model id advertised by the hosted endpoint. |
|
Name of the environment variable holding the bearer token. This is the variable name, not the secret. |
`target.api_endpoint.type=<chat |
completions>` |
|
Per-task sample cap for smoke tests. |
|
Concurrent requests issued by the evaluator where supported. |
|
Per-request timeout in seconds. |
|
Tokenizer used by log-probability tasks such as HellaSwag. |
|
Megatron Bridge checkpoint path used by |
|
Container image used by the launcher deployment in |
Discovery Commands#
uv run --no-sync nemotron steps list --category eval --json
uv run --no-sync nemotron steps show eval/model_eval --json
nemotron steps show eval/model_eval --json prints the full step contract, including consumes, produces, parameters, strategies, and errors.
Examples#
Hosted Chat Smoke Test#
: "${NVIDIA_API_KEY:?Set NVIDIA_API_KEY}"
: "${NEMO_EVALUATOR_MODEL_URL:?Set the chat-completions endpoint URL}"
: "${NEMO_EVALUATOR_MODEL_ID:?Set the endpoint model id}"
uv run --no-sync nemotron steps run eval/model_eval \
-c tiny_chat \
output_dir=./output/eval-tiny-chat \
target.api_endpoint.url="$NEMO_EVALUATOR_MODEL_URL" \
target.api_endpoint.model_id="$NEMO_EVALUATOR_MODEL_ID" \
target.api_endpoint.api_key_name=NVIDIA_API_KEY \
target.api_endpoint.type=chat \
evaluation.nemo_evaluator_config.config.params.limit_samples=1
Megatron Checkpoint Evaluation Config#
Use default.yaml when NeMo Evaluator Launcher should deploy a Megatron Bridge checkpoint and then run the configured tasks.
uv run --no-sync nemotron steps run eval/model_eval \
-c default \
output_dir=./output/eval-megatron \
deployment.checkpoint_path=/path/to/checkpoint/iter_0001000 \
evaluation.nemo_evaluator_config.config.params.limit_samples=1
Compile Without Dispatching#
uv run --no-sync nemotron steps run eval/model_eval -d -c tiny_chat \
target.api_endpoint.url="$NEMO_EVALUATOR_MODEL_URL" \
target.api_endpoint.model_id="$NEMO_EVALUATOR_MODEL_ID"
Launcher Dry Run#
uv run --no-sync nemotron steps run eval/model_eval -c tiny_chat dry_run=true