Release Notes for NeMo Microservices#

Check out the latest release notes for the NeMo microservices.

Release 25.6.0#

Features#

Platform#

  • Added support for the Llama Stack APIs.

  • Enhanced security by updating llm_as_ajudge, rag, and bigcode to 0.12.20 versions, which addresses vulnerabilities CVE-2025-47273 and CVE-2025-43859.

Customizer#

  • Config values for a customization job now require a version, denoted by the string following the @. For example, a previous call to POST /v1/customization/jobs that had config: meta/llama-3.2-1b-instruct must be changed to config: meta/llama-3.2-1b-instruct@v1.0.0+A100. Run GET /v1/customization/configs to list the available configurations for your installation.

  • You can now configure GPUs for fine-tuning workloads, including support for NVIDIA LH40 GPUs.

  • Added Customization Targets, so you can register and manage models available for fine-tuning with the Customizer service. Only registered targets work in customization jobs.

  • The model catalog now features the Llama Nemotron model family.

  • You can now launch Knowledge Distillation jobs to train smaller student models using larger teacher models.

  • Customization now works on embedding models, including Llama 3.2 1B and Llama 3.2 3B.

Evaluator#

  • You can now use parallelism for custom evaluation types and BigCode benchmarks (humaneval, mbpp, transcode_cpp, and transcode_java).

    {
      "target": "<DUMMY_MODEL_TARGET>",
      "config": {
        "type": "custom",
        "params": {
          "parallelism": 4
        }
      }
    }
    
  • Introduced new evaluation types:

    • BFCL with Tool Calling: Evaluate models that use tool-calling capabilities.

    • Agentic: Assess agent-based workflows and behaviors. We recommend using a strong model judge (at least 70B parameters).

  • The custom evaluation type now supports LLM-as-a-Judge, so you can automate model assessment using large language models.

  • Simplified standalone installation steps for the Evaluator component. Check out the updated guides for:

  • Aggregated results and logs now appear in job artifacts for custom evaluations. Use the API endpoint /v1/evaluation/jobs/{job_id}/download-results to download results.

  • You can now download logs for custom, BFCL, and Agentic evaluations with the API endpoint /v1/evaluation/jobs/{job_id}/logs.

Guardrails#

  • Enhanced the /v1/guardrail/models endpoint to support access to models on build.nvidia.com when the microservice is deployed individually. The endpoint adds support for POST, PATCH, and DELETE. OpenAI API requests include just a model name and API key, but the microservice needs the model URL to relay LLM requests. When deployed as a platform, NeMo NIM Proxy handles these LLM requests for models hosted on the cluster. These verbs provide parity between platform deployments and standalone deployments to implement the model-name-to-URL lookup in the microservice. For more information, refer to Manage NeMo Guardrails Access to Models.

  • When accessing models on build.nvidia.com, set the NVIDIA_API_KEY environment variable when starting the microservice or specify the X-Model-Authorization HTTP header. The sample docker run commands in the documentation are updated to show setting the environment variable. For more information, refer to Deploying with Docker and Custom HTTP Headers.

  • Added support for injection detection that is based on YARA rules. For more information, refer to Configuring Injection Detection.

Known Issues#

  • Agentic Evaluations: When evaluating weak judge models, such as fewer than 70B parameters, with NeMo Evaluator, the model might not follow instructions well. Not following instructions can cause evaluation failures if the expected JSON response is not formatted correctly.

  • BFCL Evaluations: NeMo Evaluator does not compute metrics for evaluations with inference errors.

  • Multi-task Evaluations: In release 25.6.0, NeMo Evaluator does not support multiple tasks defined in a single config for type Agentic, BigCode, BFCL, and LM-Harness. Custom evaluations continue to support multiple tasks.