Release Notes for NeMo Microservices#
Check out the latest release notes for the NeMo microservices.
Release 25.4.0#
Summary#
This is the first general availability (GA) release of the NeMo microservices.
To learn about the NeMo microservices, start from the following links:
Learn about the NeMo microservices.
Discover the key features of NeMo microservices.
Understand the core concepts of NeMo microservices.
Get started with NeMo microservices on a minikube cluster.
Known Issues#
The following are the known issues for the DGX Cloud Admission Controller microservice.
There is a known vulnerability in the golang crypto library which can open the system to a DDOS attack if you change the default configuration in the DGX Cloud Admission Controller Helm chart values file. Do not enable the
runaicontroller
anddgxcExport
. For more information, refer to the DGX Cloud Admission Controller Helm installation page.
The following are the known issues for the Evaluator API.
The
cancel
endpoint is not available for evaluation jobs.The
logs
endpoint is not available for evaluation jobs. Instead, use thedownload-results
endpoint. For more information, refer to Get Evaluation Results.The PATCH method is not supported.
The following are the known issues for NeMo Evaluator.
For tool-calling evaluation jobs, the nemo-ms-evaluator-about is delayed when there is incomplete type info. Tool calls might take more than 30 seconds if the descriptions for
array
types lackitems
specifications, or if the descriptions forobject
types lackproperties
specifications. Be sure to include these details in tool descriptions. For more information, refer to Custom Tool Calling Evaluation.For tool-calling evaluation jobs, the microservice currently does not support functions with more than 8 parameters. Tool calls might freeze the NIM if a tool description includes a function with more than 8 parameters. If this occurs, restart the NIM. For more information, refer to Custom Tool Calling Evaluation.
When you run an LM Evaluation Harness evaluation of type gsm8k or its variants, there is a difference in results when you apply the chat template flag, for a subset of model endpoints compared to their corresponding public benchmark results. Prompt tokens from the model server add an extra beginning token.