Deploying AI workloads with the NIM Operator on Red Hat OpenShift#
The NVIDIA NIM Operator enables Kubernetes cluster administrators to operate the software components and services necessary to deploy NVIDIA NIMs and NVIDIA NeMo microservices in Kubernetes
Deploying via the NIM Operator provides low-level optimizations that streamline developer workflows with:
Model Caching: Large Language Models (LLMs) often exceed 100GB, leading to long start up times. The NIM Operator solves this by using a dedicated caching controller. It pre-downloads and stores model weights on Persistent Volumes (PVs), ensuring that when a service scales, the pods pull from local storage rather than the remote registry.
Observability: Prometheus metrics to track NIM cache, service, and pipelines deployed to your cluster. It also provides several common Kubernetes operator metrics.
Autoscaling: scale based on different metrics, either based on DCG metrics or NIM-specific metrics for the service handling requests to your cached models
Dynamic Resource Allocation (DRA): The Operator communicates directly with the underlying NVIDIA GPU hardware to ensure optimal scheduling. It leverages DRA to match the specific requirements of a model (like GPU memory or compute capability) to the available hardware without manual environment variable tuning.
Government Ready: NVIDIA NIM Operator is government ready, NVIDIA’s designation for software that meets applicable security requirements for deployment in your FedRAMP High or equivalent sovereign use case
Standalone Deployments#
The NIM Operator deploys standalone NIM resources as Custom Resource Definitions(CRD). By using the NIMService CRD, platform engineers can treat generative AI models as standard Kubernetes objects, integrating them directly into existing production DevOps workflows. Multiple NIM service resources can be deployed together with NIMPipeline.
Deploy with Kserve#
An alternative to NIMService, the NVIDIA NIM Operator supports both raw deployment and serverless deployment of NIM through KServe on Kubernetes clusters, including Red Hat OpenShift Container Platform. There is a purpose-built Kubernetes controller designed to automate the deployment, scaling, and management of NIM microservices.
Regardless of the deployment option, the Operator can manage the lifecycle of the following microservices and the models they use:
NVIDIA NIM models, such as:
Reasoning LLMs
Retrieval — embedding, reranking, and other functions
Speech
Biology
NeMo core microservices:
NeMo Customizer
NeMo Evaluator
NeMo Guardrails
NeMo platform component microservices:
NeMo Data Store
NeMo Entity Stor
Refer to the NIM Operator documentation for steps on deploying with the NIM Operator on Red Hat OpenShift.