Release Notes for NeMo Microservices#

Check out the latest release notes for the NeMo microservices.

Tip

If you’ve installed one of the previous releases of the NeMo microservices using Helm and want to upgrade, choose one of the following options:

Release 25.7.0#

The following are the key features and known issues for the NeMo microservices 25.7.0 release.

The following features are added in this release.

Released the NeMo Microservices Python SDK. To install and get started, refer to the latest beginners tutorials and the Python SDK documentation:
Added support for NVIDIA B200 GPUs.

Target and Configuration changes to the ConfigMap no longer require an application restart to propagate.
Enabled passing Weights & Biases project details such as entities, tags, and descriptions at runtime using the new integrations configuration in the Customization jobs endpoint.
Added support for NVIDIA B200 GPUs for customization jobs.

Removed MT Bench LLM-as-a-Judge; use the Custom evaluation type with the LLM-Judge metric.
Added support to control reasoning for OAI-compatible models and Nemotron models as evaluation targets, as well as LLM-as-a-Judge across all benchmarks.
Enabled log downloads for BigCode and LMEvalHarness evaluations via the /v1/evaluation/jobs/{job_id}/logs API endpoint.
Improved benchmark accuracy with updated Evaluator APIs.
Enabled authentication for Custom LLM-Judge and Agentic Judge.

The following issues are fixed in this release.

Fixed the OpenAPI spec for the /v1/guardrail/chat/completions, /v1/guardrail/checks, and /v1/guardrail/completions endpoints to correctly specify that only 200 status codes are returned (removing the 201 option). Also fixed the OpenAPI spec to correctly include the text/event-stream response type for streaming mode.

LLama Nemotron Nano and Super models:
- Do not support LoRA adapters
- Will not properly deploy with Deployment Manager. You can alternatively deploy these models as NIMs using Helm.