Quickstart#

Get up and running with NeMo Evaluator in minutes. Choose your preferred approach based on your needs and experience level.

Prerequisites#

All paths require:

  • OpenAI-compatible endpoint (hosted or self-deployed)

  • Valid API key for your chosen endpoint

Choose Your Path#

Select the approach that best matches your workflow and technical requirements:

NeMo Evaluator Launcher

Recommended for most users

Unified CLI experience with automated container management, built-in orchestration, and result export capabilities.

NeMo Evaluator Launcher
NeMo Evaluator Core

For Python developers

Programmatic control with full adapter features, custom configurations, and direct API access for integration into existing workflows.

NeMo Evaluator Core
Container Direct

For container workflows

Direct container execution with volume mounting, environment control, and integration into Docker-based CI/CD pipelines.

Container Direct

Model Endpoints#

NeMo Evaluator works with any OpenAI-compatible endpoint. You have several options:

Self-Hosted Options#

If you prefer to host your own models:

# vLLM (recommended for self-hosting)
pip install vllm
vllm serve meta-llama/Llama-3.1-8B-Instruct --port 8080

# Or use other serving frameworks
# TRT-LLM, NeMo Framework, etc.

See Serve and Deploy Models for detailed deployment options.

Validation and Troubleshooting#

Quick Validation Steps#

Before running full evaluations, verify your setup:

# 1. Test your endpoint connectivity
curl -X POST "https://integrate.api.nvidia.com/v1/chat/completions" \
    -H "Authorization: Bearer $NGC_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "meta/llama-3.1-8b-instruct",
        "messages": [{"role": "user", "content": "Hello!"}],
        "max_tokens": 10
    }'

# 2. Run a dry-run to validate configuration
nv-eval run \
    --config-dir examples \
    --config-name local_llama_3_1_8b_instruct \
    --dry-run

# 3. Run a minimal test with very few samples
nv-eval run \
    --config-dir examples \
    --config-name local_llama_3_1_8b_instruct \
    -o +config.params.limit_samples=1 \
    -o execution.output_dir=./test_results

Common Issues and Solutions#

# Verify your API key is set correctly
echo $NGC_API_KEY

# Test with a simple curl request (see above)
# Check Docker is running and has GPU access
docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu20.04 nvidia-smi

# Pull the latest container if you have issues
docker pull nvcr.io/nvidia/eval-factory/simple-evals:25.08.1
# Enable debug logging
export NEMO_EVALUATOR_LOG_LEVEL=DEBUG

# Check available evaluation types
nv-eval ls tasks
# Check if results were generated
find ./results -name "*.yml" -type f

# View task results
cat ./results/<invocation_id>/<task_name>/artifacts/results.yml

# Or export and view processed results
nv-eval export <invocation_id> --dest local --format json
cat ./results/<invocation_id>/processed_results.json

Next Steps#

After completing your quickstart:

# List all available tasks
nv-eval ls tasks

# Run with limited samples for quick testing
nv-eval run --config-dir examples --config-name local_limit_samples
# Export to MLflow
nv-eval export <invocation_id> --dest mlflow

# Export to Weights & Biases  
nv-eval export <invocation_id> --dest wandb

# Export to Google Sheets
nv-eval export <invocation_id> --dest gsheets

# Export to local files
nv-eval export <invocation_id> --dest local --format json
# Run on Slurm cluster
nv-eval run --config-dir examples --config-name slurm_llama_3_1_8b_instruct

# Run on Lepton AI
nv-eval run --config-dir examples --config-name lepton_vllm_llama_3_1_8b_instruct

Quick Reference#

Task

Command

List benchmarks

nv-eval ls tasks

Run evaluation

nv-eval run --config-dir examples --config-name <config>

Check status

nv-eval status <invocation_id>

Export results

nv-eval export <invocation_id> --dest local --format json

Dry run

Add --dry-run to any run command

Test with limited samples

Add -o +config.params.limit_samples=3