Quickstart#
Get up and running with NeMo Evaluator in minutes. Choose your preferred approach based on your needs and experience level.
Prerequisites#
All paths require:
OpenAI-compatible endpoint (hosted or self-deployed)
Valid API key for your chosen endpoint
Quick Reference#
Task |
Command |
---|---|
List benchmarks |
|
Run evaluation |
|
Check status |
|
Debug job |
|
Export results |
|
Dry run |
Add |
Test with limited samples |
Add |
Choose Your Path#
Select the approach that best matches your workflow and technical requirements:
Recommended for most users
Unified CLI experience with automated container management, built-in orchestration, and result export capabilities.
For Python developers
Programmatic control with full adapter features, custom configurations, and direct API access for integration into existing workflows.
For container workflows
Direct container execution with volume mounting, environment control, and integration into Docker-based CI/CD pipelines.
Model Endpoints#
NeMo Evaluator works with any OpenAI-compatible endpoint. You have several options:
Hosted Endpoints (Recommended)#
NVIDIA Build: build.nvidia.com - Ready-to-use hosted models
OpenAI: Standard OpenAI API endpoints
Other providers: Anthropic, Cohere, or any OpenAI-compatible API
Self-Hosted Options#
If you prefer to host your own models:
# vLLM (recommended for self-hosting)
pip install vllm
export HF_TOKEN=hf_your-token-here
vllm serve meta-llama/Llama-3.1-8B-Instruct --port 8080
# Or use other serving frameworks
# TRT-LLM, NeMo Framework, etc.
Validation and Troubleshooting#
Quick Validation Steps#
Before running full evaluations, verify your setup:
# 1. Test your endpoint connectivity
export NGC_API_KEY=nvapi-...
curl -X POST "https://integrate.api.nvidia.com/v1/chat/completions" \
-H "Authorization: Bearer $NGC_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta/llama-3.1-8b-instruct",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 10
}'
# 2. Run a dry-run to validate configuration
nemo-evaluator-launcher run \
--config-dir packages/nemo-evaluator-launcher/examples \
--config-name local_llama_3_1_8b_instruct \
--dry-run
# 3. Run a minimal test with very few samples
nemo-evaluator-launcher run \
--config-dir packages/nemo-evaluator-launcher/examples \
--config-name local_llama_3_1_8b_instruct \
-o +config.params.limit_samples=1 \
-o execution.output_dir=./test_results
Common Issues and Solutions#
# Verify your API key is set correctly
echo $NGC_API_KEY
# Test with a simple curl request (see above)
# Check Docker is running and has GPU access
docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu20.04 nvidia-smi
# Pull the latest container if you have issues
docker pull nvcr.io/nvidia/eval-factory/simple-evals:25.09
# Enable debug logging
export LOG_LEVEL=DEBUG
# Check available evaluation types
nemo-evaluator-launcher ls tasks
# Check if results were generated
find ./results -name "*.yml" -type f
# View task results
cat ./results/<invocation_id>/<task_name>/artifacts/results.yml
# Or export and view processed results
nemo-evaluator-launcher export <invocation_id> --dest local --format json
cat ./results/<invocation_id>/processed_results.json
Next Steps#
After completing your quickstart:
# List all available tasks
nemo-evaluator-launcher ls tasks
# Run with limited samples for quick testing
nemo-evaluator-launcher run --config-dir packages/nemo-evaluator-launcher/examples --config-name local_limit_samples
# Export to MLflow
nemo-evaluator-launcher export <invocation_id> --dest mlflow
# Export to Weights & Biases
nemo-evaluator-launcher export <invocation_id> --dest wandb
# Export to Google Sheets
nemo-evaluator-launcher export <invocation_id> --dest gsheets
# Export to local files
nemo-evaluator-launcher export <invocation_id> --dest local --format json
cd packages/nemo-evaluator-launcher
# Run on Slurm cluster
nemo-evaluator-launcher run --config-dir packages/nemo-evaluator-launcher/examples --config-name slurm_llama_3_1_8b_instruct
# Run on Lepton AI
nemo-evaluator-launcher run --config-dir packages/nemo-evaluator-launcher/examples --config-name lepton_vllm_llama_3_1_8b_instruct