Release Notes for NeMo Microservices#
Check out the latest release notes for the NeMo microservices.
Release 25.6.0#
Features#
Platform#
Added support for the Llama Stack APIs.
Enhanced security by updating
llm_as_ajudge
,rag
, andbigcode
to0.12.20
versions, which addresses vulnerabilities CVE-2025-47273 and CVE-2025-43859.
Customizer#
Config
values for a customization job now require a version, denoted by the string following the@
. For example, a previous call to POST/v1/customization/jobs
that hadconfig: meta/llama-3.2-1b-instruct
must be changed toconfig: meta/llama-3.2-1b-instruct@v1.0.0+A100
. Run GET/v1/customization/configs
to list the available configurations for your installation.You can now configure GPUs for fine-tuning workloads, including support for NVIDIA LH40 GPUs.
Added Customization Targets, so you can register and manage models available for fine-tuning with the Customizer service. Only registered targets work in customization jobs.
The model catalog now features the Llama Nemotron model family.
You can now launch Knowledge Distillation jobs to train smaller student models using larger teacher models.
Customization now works on embedding models, including Llama 3.2 1B and Llama 3.2 3B.
Evaluator#
You can now use parallelism for
custom
evaluation types and BigCode benchmarks (humaneval
,mbpp
,transcode_cpp
, andtranscode_java
).{ "target": "<DUMMY_MODEL_TARGET>", "config": { "type": "custom", "params": { "parallelism": 4 } } }
Introduced new evaluation types:
BFCL with Tool Calling: Evaluate models that use tool-calling capabilities.
Agentic: Assess agent-based workflows and behaviors. We recommend using a strong model judge (at least 70B parameters).
The
custom
evaluation type now supports LLM-as-a-Judge, so you can automate model assessment using large language models.Simplified standalone installation steps for the Evaluator component. Check out the updated guides for:
Aggregated results and logs now appear in job artifacts for
custom
evaluations. Use the API endpoint/v1/evaluation/jobs/{job_id}/download-results
to download results.You can now download logs for
custom
,BFCL
, andAgentic
evaluations with the API endpoint/v1/evaluation/jobs/{job_id}/logs
.
Guardrails#
Enhanced the
/v1/guardrail/models
endpoint to support access to models on build.nvidia.com when the microservice is deployed individually. The endpoint adds support for POST, PATCH, and DELETE. OpenAI API requests include just a model name and API key, but the microservice needs the model URL to relay LLM requests. When deployed as a platform, NeMo NIM Proxy handles these LLM requests for models hosted on the cluster. These verbs provide parity between platform deployments and standalone deployments to implement the model-name-to-URL lookup in the microservice. For more information, refer to Manage NeMo Guardrails Access to Models.When accessing models on build.nvidia.com, set the
NVIDIA_API_KEY
environment variable when starting the microservice or specify theX-Model-Authorization
HTTP header. The sampledocker run
commands in the documentation are updated to show setting the environment variable. For more information, refer to Deploying with Docker and Custom HTTP Headers.Added support for injection detection that is based on YARA rules. For more information, refer to Configuring Injection Detection.
Known Issues#
Agentic Evaluations: When evaluating weak judge models, such as fewer than 70B parameters, with NeMo Evaluator, the model might not follow instructions well. Not following instructions can cause evaluation failures if the expected JSON response is not formatted correctly.
BFCL Evaluations: NeMo Evaluator does not compute metrics for evaluations with inference errors.
Multi-task Evaluations: In release 25.6.0, NeMo Evaluator does not support multiple tasks defined in a single config for type Agentic, BigCode, BFCL, and LM-Harness. Custom evaluations continue to support multiple tasks.