Release Notes for NeMo Microservices#

Check out the latest release notes for the NeMo microservices.

Tip

If you’ve installed one of the previous releases of the NeMo microservices using Helm and want to upgrade, choose one of the following options:

Release 25.8.0#

The following are the key features and known issues for the NeMo microservices 25.8.0 release.

The following features are added in this release.

Added support for the NVIDIA NeMo Retriever Llama3.2 Embedding NIM. You can deploy it using NeMo Deployment Management, fine-tune it using NeMo Customizer, and evaluate it using NeMo Evaluator.
Released the new version (1.1.0) of the NeMo Microservices Python SDK. Key updates:
- Added support for two new microservices: NeMo Auditor and NeMo Data Designer. These are currently in beta and subject to potential API changes in future releases.
- Dropped support for Python 3.8, making Python 3.9 the new minimum required version.

NeMo Auditor is a new microservice that enables you to assess the safety risk of models or systems. The microservice is released with early access availability and is subject to limited support and potential API changes in future releases. For more information, refer to About Auditing Models.

Added support for Direct Preference Optimization (DPO) fine-tuning, an RL-free alignment algorithm that optimizes models using preference data. DPO enables you to align models with human preferences by maximizing the probability of preferred responses while minimizing rejected ones, without requiring an explicit reward model. For more information, refer to Start a DPO Customization Job.
Added fine-tuning support for embedding models, specifically the NVIDIA NeMo Retriever Llama3.2 Embedding model. You can now customize embedding models for your specific domain or use case using supervised fine-tuning (SFT) techniques. For more information, refer to Embedding Models.

NeMo Data Designer is a new microservice that enables you to generate high-quality synthetic datasets using AI models, statistical sampling, and configurable data schemas. The microservice is released with early access availability and is subject to limited support and potential API changes in future releases. For more information, refer to About Generating Synthetic Data.

Updated support for RAG and Retriever benchmarks.
Added RAGAS NVIDIA Metrics to RAG evaluation type.
Added live progress tracking for evaluation jobs through the /v1/evaluation/jobs/{job_id}/status API endpoint.

Added support for parallel execution of input and output rails implemented in the Guardrails toolkit v0.15.0. This release introduces the new parameters rails.input.parallel and rails.output.parallel. You can set them to true to enable parallel execution in a guardrail configuration through the create guardrails config API. For a tutorial on how to use parallel rails, refer to Parallel Execution of Input and Output Rails.
Implemented the guardrails configuration object synchronization across multiple replicas of NeMo Guardrails.

The following changes are made in this release.

Changed the default database for the NeMo Guardrails microservice to use PostgreSQL. If you have an existing NeMo Guardrails deployment, you can upgrade to use an external PostgreSQL database by following the instructions at PostgreSQL.

Improved job lifecycle for monitoring and cleanup of Kubernetes artifacts after job completion or termination.
Removed dependency on Argo Workflows.

For NeMo Guardrails, if you PATCH or DELETE a /v1/guardrails/configs endpoint, you need to restart your nemo-guardrails deployment for the changes to take effect.
Set NIM_MODEL_PROFILE when deploying NIM version 1.8 (versions greater than 1.8.1) on NVIDIA H100 Tensor Core GPU. A regression is observed from prebuilt-throughput engine on H100 GPU compared to buildable profiles which results in lower evaluation scores.
```
NIM_MODEL_PROFILE=ac34857f8dcbd174ad524974248f2faf271bd2a0355643b2cf1490d0fe7787c2
```