Runtime and Execution Issues#
Solutions for problems that occur during evaluation execution, including configuration validation and launcher management.
Common Runtime Problems#
When evaluations fail during execution, start with these diagnostic steps:
# Validate configuration before running
nv-eval run --config-dir examples --config-name local_llama_3_1_8b_instruct --dry-run
# Test minimal configuration
python -c "
from nemo_evaluator import EvaluationConfig, ConfigParams
config = EvaluationConfig(type='mmlu', params=ConfigParams(limit_samples=1))
print('Configuration valid')
"
import requests
# Test model endpoint connectivity
response = requests.post(
"http://0.0.0.0:8080/v1/completions/",
json={"prompt": "test", "model": "megatron_model", "max_tokens": 1}
)
print(f"Endpoint status: {response.status_code}")
# Monitor system resources during evaluation
nvidia-smi -l 1 # GPU usage
htop # CPU/Memory usage
Runtime Categories#
Choose the category that matches your runtime issue:
Configuration Issues
Config parameter validation, tokenizer setup, and endpoint configuration problems.
Launcher Issues
NeMo Evaluator Launcher-specific problems including job management and multi-backend execution.