About NeMo Evaluator SDK#
NeMo Evaluator SDK is NVIDIA’s comprehensive platform for AI model evaluation and benchmarking. It consists of two core libraries that work together to enable consistent, scalable, and reproducible evaluation of large language models across diverse capabilities including reasoning, code generation, function calling, and safety.
System Architecture#
NeMo Evaluator SDK consists of two main libraries:
Component |
Key Capabilities |
---|---|
nemo-evaluator |
• Interceptors for request and response processing |
nemo-evaluator-launcher |
• Unified CLI and programmatic entry points |
Target Users#
User Type |
Key Benefits |
---|---|
Researchers |
Access 100+ benchmarks across multiple evaluation harnesses with containerized reproducibility. Run evaluations locally or on HPC clusters. |
ML Engineers |
Integrate evaluations into ML pipelines with programmatic APIs. Deploy models and run evaluations across multiple backends. |
Organizations |
Scale evaluation across teams with unified CLI, multi-backend execution, and result tracking. Export results to MLflow, Weights & Biases, or Google Sheets. |
AI Safety Teams |
Conduct safety assessments using specialized containers for security testing and bias evaluation with detailed logging. |
Model Developers |
Evaluate custom models against standard benchmarks using OpenAI-compatible APIs. |