Release Notes#
v3.0.2#
Features#
Added validation webhooks for NIM Operator custom resources to enforce schema validation using the
dry-runoption and immutability checks. This feature enabled by default on OpenShift and disabled by default on other Kubernetes platforms. Enable validation webhooks by installing cert-manager and installing or upgrading via Helm with theoperator.admissionController.enabled=truevalue.Validation webhooks use self-signed certificates by default. For production environments, consider configuring custom certificates via Helm values if your organization requires certificates signed by a trusted certificate authority.
Added support for NeMo Guardrails v25.10 and guardrail event tracing with OpenTelemetry. You can enable distributed tracing and observability for guardrails services by configuring
spec.otel.enabledandspec.otel.exporterOtlpEndpointin your NeMo Guardrails custom resource, with support for logs, traces, and metrics exporters.Enhanced GPU filtering support when caching model profiles to include vLLM and SGLang engines in addition to TensorRT-LLM. Specify
nimcache.spec.source.ngc.model.engineasvllmorsglangalong withnimcache.spec.source.ngc.model.tensorParallelismto filter and cache only the profiles matching your target engine and tensor parallelism configuration.Added support for ephemeral storage using
emptyDirvolumes in NIM Service custom resources withspec.storage.emptyDir. This enables fast temporary storage for testing, demos, or stateless workloads without requiring persistent volume claims, with optional size limits inspec.storage.emptyDir.sizeLimit.When using ephemeral storage with
emptyDirin NIM Service, data is not retained after pod restarts. This storage type is intended for temporary workloads only and should not be used for production deployments requiring data persistence.Added support for
hostPathvolumes in NIM Service custom resources usingspec.storage.hostPath. This enables a NIM Service to store models to a local node directory in single-node environments or proof-of-concept deployments where centralized storage is unavailable.Added support for pod affinity, pod anti-affinity, and node affinity configurations in NIM Service and NeMo custom resources using the
spec.affinityfield. You can now control pod placement and co-location behavior to optimize resource utilization and meet application topology requirements. (Issue 635)
Bug Fixes#
Fixed an issue where
spec.replicasandspec.scale.enabledcould both be set in NIM Service custom resources. Now, when horizontal pod autoscaling (HPA) is enabled withspec.scale.enabled: true, you cannot setspec.replicas, as HPA manages replica count automatically usingspec.scale.hpa.minReplicasandspec.scale.hpa.maxReplicas.Fixed an issue where GPU filtering was not applied to vLLM and SGLang model profiles during caching, causing unnecessary profile downloads when
nimcache.spec.source.ngc.model.enginewas set tovllmorsglang.Fixed an issue to support custom command and arguments in NIM Service custom resources using
spec.commandandspec.argsfields. This allows you to override the default container entrypoint and arguments for specialized deployment scenarios or debugging purposes.Fixed an issue to allow a Multi-LLM NIM Service to run without pre-cached models. Use on-demand caching by specifying your model and PVC requirements in the NIM Service.
Fixed an issue to update CA certificates only when a custom ConfigMap is provided.
Fixed NIM Service status handling when scaling deployments to zero replicas to prevent excessive log output and reconciliation loops. The NIM Service status is now set to
NotReadyand logging verbosity is set to-v=4by default. (Issue #710)Fixed an issue where proxy settings were not applied when NIM Cache or NIM Service
spec.proxy.certConfigMapwas not specified. (Issue #700)Updated Bitnami images in NeMo Microservices dependency Ansible deployment playbooks.
v3.0.1#
Features#
NVIDIA NIM Operator is now government ready, NVIDIA’s designation for software that meets applicable security requirements for deployment in your FedRAMP High or equivalent sovereign use case. NVIDIA AI Enterpise customers can start using this feature by deploying government ready NIM Operator, NVIDIA GPU Operator, and NIM components.
For more information on NVIDIA’s government ready support, refer to the white paper AI Software for Regulated Environments.
v3.0.0#
Features#
Added support for Multi-LLM NIM container that enables you to deploy a wide variety of models from a single container. The NIM Operator can now cache and serve LLM-Specific and Multi-LLM NIM from NVIDIA NGC, NVIDIA NeMo Data Store, or the Hugging Face Hub.
Learn more about when to use a Multi-LLM NIM container or LLM-Specific container from the Overview of NVIDIA NIM for LLMs: NIM Options documentation. For details on Multi-LLM NIM container supported architectures, refer to the Supported Architectures for NVIDIA NIM for LLMs documentation.
Added NIM Build as a new custom resource to support building and caching TensorRT-LLM model engines from built-in LLM buildable profiles.
Added support for using Kubernetes Dynamic Resource Allocation (DRA) for GPU allocation to NIM Operator deployments.
Added support for deploying large NIM that require multi-node GPUs using
LeaderWorkerSets.Added support for the deployment and management of NIM through KServe, including both raw and serverless deployments.
NeMo Customizer, NeMo Data Store, NeMo Entity Store, and NeMo Guardrails have been updated to v25.8.0.
Note
NeMo Evaluator remains at v25.6.0 due to an architecture change in how evaluation jobs are managed, but it will be updated in a subsequent release.
Known Issues#
For large models (49+ billion parameters), you may encounter the
Too many open filesdownload error during caching. To avoid this error, you must update the max open file limit for the container runtime. For example, if you are usingcontainerdas the container runtime, run the following commands:$ sudo mkdir -p /etc/systemd/system/containerd.service.d $ echo "[Service]" | sudo tee /etc/systemd/system/containerd.service.d/override.conf $ echo "LimitNOFILE=65536" | sudo tee -a /etc/systemd/system/containerd.service.d/override.conf $ sudo systemctl daemon-reload $ sudo systemctl restart containerd $ sudo systemctl restart kubelet
There is a known issue with NVIDIA NeMo Retriever models where filtering is not working correctly when
nimcache.spec.source.ngc.model.engineis set to “tensorrt”. Caching of NeMo retriever embedding, reranking models will fail whennimcache.spec.source.ngc.model.engineis specified astensorrt. NVIDIA recommends caching all model profiles without filtering for these models.When downloading models from Hugging Face for Multi-LLM NIM, log files may show permission denied warnings that can be safely ignored.
v2.0.2#
Bug Fixes#
Fixed an issue where model caching was failing with some older NIM containers due to errors when parsing v1
model_manifest.yamlfiles. For example, with Mistral NIMnvcr.io/nim/nv-mistralai/mistral-nemo-12b-instruct:1.2.2.Added support for the
—revisionflag when pulling models/datasets with HF compatible data sources, such as NeMo Datastore.Fixed an issue, where deletions of operator custom resources, such as NIMService, are stuck with foreground deletion.
Increased the startup probe timeout to 20 minutes for NIMService pods. This helps to avoid restarts when the model is not pre-cached and takes a long time to download and build.
Known Issues#
Permission denied errors during caching using NIMCache of some older NIMs, such as with Mistral NIM
nvcr.io/nim/nv-mistralai/mistral-nemo-12b-instruct:1.2.2. This happens due to the container not honoring theNIM_CACHE_PATHenv variable and instead caching into internal directories. One workaround is to explicitly specify theHF_HOMEandNGC_HOMEvariables in the NIMCache instance, such as in the example below:apiVersion: apps.nvidia.com/v1alpha1 kind: NIMCache metadata: name: meta-llama3.1-8b-instruct spec: env: - name: NGC_HOME value: /model-store/ngc - name: HF_HOME value: /model-store/huggingface source: ngc: modelPuller: nvcr.io/nim/mistralai/mixtral-8x22b-instruct-v01:latest pullSecret: ngc-secret authSecret: ngc-api-secret model: profiles: [28a820fc752f0964280f0e4eba7e0490b95b790b5e3c4fff218ef9d2b12b9f76] storage: pvc: create: true storageClass: local-path size: "100Gi" volumeAccessMode: ReadWriteOnce
v2.0.1#
Features#
Added support for NeMo microservices v25.6.0. Updated the following NeMo microservices custom resources to support NeMo microservices v25.6.0:
NeMo Evaluator custom resource now supports updated
evaluationImagesfor evaluation jobs.NeMo Customizer custom resource now supports Customization Targets to represent a model that can be customized (fine-tuned) using the Customizer service. Use
data.customizationTargetsto define your model targets in you Customizer model Configmap instead ofdata.models, used in previous versions. Refer to the NeMo Customization Targets documentation for more details.
Refer to the NeMo microservices release notes for a full list of changes in this release.
Added the
spec.proxyparameter to NIM cache and NIM service custom resource. This improves support for clusters operating behind HTTP proxies by allowing you to specify your proxy configuration using thespec.proxyparameter. This must be configured in both your NIM cache and NIM service custom resources. This parameter replaces thespec.certConfigparameter, which is now deprecated.Added support for specifying GRPC and metrics ports in NIM services custom resource for non-LLM NIM running a Triton Inference Server. (PR#490).
Added support for specifying a custom scheduler in a NIM service custom resource. Use
spec.schedulerNameto specify the name of the scheduler to use for NIM jobs. If no custom scheduler name is set, your default Kubernetes scheduler is used. (PR#489).Added support for size limits for a emptyDir volume created for NIMService Deployments and NeMoCustomizer training jobs. Specify either
.spec.storage.sharedMemorySizeLimitin NIM service custom resource or.spec.training.sharedMemorySizeLimitin the NeMo Customizer custom resources to set a shared memory limit. By default, an emptyDir volume is created with no size limit. (PR#492).Added support for setting annotations for NIM Operator created PVCs. (PR#508).
Added support for pulling models and datasets from HuggingFace Hub and NeMo Data Store into your NIM cache using the
spec.source.hfandspec.source.dataStoreparameters.
Bug Fixes#
Updated the NIM service status to include model information for non-LLM NIM. (PR#498)
Fixed an issue where the resource field was missing in Helm upgrade hooks that could prevent the upgrade CRD jobs from running properly.
Fixed a bug where the NIM pipeline and NIM service status was incorrectly being marked as
Failedwhen the NIM cache was still in progress.Fixed an issue where the resource field was missing in Helm upgrade hooks that could prevent the upgrade CRD jobs from running properly.
Fixed a bug where the NIM pipeline and NIM service status was incorrectly being marked as
Failedwhen the NIM cache was still in progress. (PR#504)Fixed an issue where the NIM cache would fail to find non-LLM models when no profile was specified in NIM cache custom resource. (PR#513)
Fixed an issue in the Data Flywheel with Jupyter notebook tutorial where the NIM Operator was not creating PVCs when the default storage class was something other than “local-path” provisioner.
Removals and Deprecations#
Deprecated NIM cache
spec.certConfigparameter. If you were using thespec.certConfigparameter to specify custom CA certificates in previous versions, you should update your NIM cache resources to usespec.proxyand add your proxy configuration to the NIM service. Refer to Proxy Support for details.
Known Issues#
There is a known issue with NVIDIA NeMo Retriever models where filtering is not working correctly when
nimcache.spec.source.ngc.model.engineis set to “tensorrt”. This issue is due to the swap in product-name string between the node label and model manifest.
v2.0.0#
Features#
Added support for deploying the NVIDIA NeMo microservices as custom resources with the NVIDIA NIM Operator. NVIDIA NeMo microservices are a modular set of tools that you can use to customize, evaluate, and secure large language models (LLMs) while optimizing AI applications across on-premises or cloud-based Kubernetes clusters. Deploying these mircoservices with the NIM Operator provides the ability to manage your AI workflows across your Kubernetes cluster.
The NIM Operator supports deploying the following NeMo microservices to your cluster as custom resources:
NeMo core microservices
NeMo Customizer
NeMo Evaluator
NeMo Guardrails
NeMo platform microservices
NeMo Data Store
NeMo Entity Store
Get started with the NeMo microservices.
Refer to the NeMo microservices documentation for details on using these microservices.
Improved NIMService status to detail model information including the cluster endpoint, external endpoint, and name of the model the service is connected to.
Updated required fields for NIMService and NIMPipeline custom resources including:
nimservice.spec.image.repositorynimservice.spec.image.tagnimpipeline.spec.image.repositorynimpipeline.spec.image.tagnimpipeline.spec.expose
Enable caching of pre-built TRT-LLM engines and optimized artifacts for non-LLM, such as Riva and BioNeMo NIM.
Add support for configuring annotations and security contexts for the NIM Operator deployment using Helm values, fixing issue #333.
Decouple NIM Operator upgrades from NIMService upgrades. NIMService pods are now only restarted when their corresponding CR specifications are updated.
Improve support for clusters operating behind HTTP proxies, including injection of proxy environment variables and custom CA certificates.
Bug Fixes#
Fixed an issue where an empty storage class was set in the PVCs created by the NIM Operator for caching NIM.
v1.0.1#
Breaking Changes#
Renamed the
spec.gpuSelectorsfield in the NIM cache custom resource tospec.nodeSelector. The purpose of the field remains the same–to specify the node selector labels for scheduling the caching job. Refer to Air-Gapped Environments.Changed the Operator pod metrics from HTTPS protocol on port
8443to HTTP protocol on port8080.
Features#
Added a
spec.envfield to the NIM cache custom resource to support environment variables. One use of the field is to specify variables such asHTTPS_PROXYfor air-gapped and specialized networks. Refer to Air-Gapped Environments.Updated the
spec.expose.service.typefield in the NIM service custom resource to support common service types, such asLoadBalancer.Added a
spec.runtimeClassNamefield to the NIM service custom resource to support setting the runtime class on a NIM service deployment.Removed the
kube-rbac-proxycontainer from the Operator pod. This change improves the security posture of the Operator. Previously, you might need to provide TLS certificates when you configured Prometheus. With this release, you no longer need to provide the certificates.Certified the Operator for use with Red Hat OpenShift Container Platform.
v1.0.0#
Features#
NVIDIA NIM Operator is new.
Known Issues#
The container versions for the NeMo Retriever Text Embedding NIM and NeMo Retriever Text Reranking NIM are not publicly available and result in an image pull back off error. The Operator and documentation were developed with release candidate versions of these microservices.
The Operator does not support configuring NIM microservices in a multi-node deployment.
For VMware vSphere with Tanzu clusters using vGPU software, to use an inference model that requires more than one GPU, the NVIDIA A100 or H100 GPUs must be connected with NVLink or NVLink Switch. These clusters also do not support multi-GPU models with L40S GPUs and vGPU software.
The Operator is not verified in an air-gapped network environment.
The sample RAG application cannot be deployed on Red Hat OpenShift Container Platform.
The Operator has transitive dependency on go.uber.org/zap v1.26.0. Findings indicate Cross-Site Scripting (XSS) vulnerabilities in the Zap package.