NVIDIA NIM Operator
About the Operator
NVIDIA NIM Operator enables cluster administrators to operate the software components and services that are necessary to run LLM, embedding, and other models with NVIDIA NIM microservices in Kubernetes.
The Operator manages the life cycle of the following microservices and the models they use:
NVIDIA NIM for LLMs
NeMo Retriever Text Embedding NIM
NeMo Retriever Text Reranking NIM
NVIDIA provides a sample multi-turn RAG pipeline. The pipeline deploys a chat bot web application and a chain server. The chain server communicates with the NIM microservices and a vector database.
Benefits of Using the Operator
Using the NIM Operator simplifies the operation and lifecycle management of NIM microservices at scale and at the cluster level. Custom resources simplify the deployment and lifecycle management of multiple AI inference pipelines, such as RAG and multiple LLM inferences. Additionally, the NIM Operator supports caching models to reduce the initial inference latency and enable auto-scaling.
The Operator uses the following custom resources:
nimcaches.apps.nvidia.com
This custom resource enables downloading models from NVIDIA NGC and persisting them on network storage. One advantage to caching a model is that when multiple instances of the same NIM microservice start, the microservices use the single cached model. However, caching is optional. Without caching, each NIM microservice instance downloads a copy of the model when it starts.
nimservices.apps.nvidia.com
This custom resource represents a NIM microservice. Adding and updating a NIM service resource creates a Kubernetes deployment for the microservice in a namespace.
The custom resource supports using a model from a existing NIM cache resource or a persistent volume claim.
The custom resource also supports creating a horizontal pod autoscaler, ingress, and service monitor to simplify cluster administration.
nimpipelines.apps.nvidia.com
This custom resource represents a group of NIM service custom resources.
Limitations
The Operator has the following limitations:
If all the GPUs in your cluster are allocated and you change a custom resource that requires starting a new pod that requires access to a GPU device, the new pods become stuck in a
Pending
state because Kubernetes cannot schedule the new pods to a node with an allocatable GPU resource.
Licences
The following table identifies the licenses for the software components related to the Operator.
Component |
Artifact Type |
Artifact Licenses |
Source Code License |
---|---|---|---|
NVIDIA NIM Operator |
Helm Chart |
||
NVIDIA NIM Operator |
Image |
||
NVIDIA NIM for LLMs |
Container |
None |
|
NVIDIA NeMo Retriever Text Embedding NIM |
Container |
None |
|
NVIDIA NeMo Retriever Text Reranking NIM |
Container |
None |
The is accessible to early-access participants only.
Third Party Software
The Chain Server that you can deploy with the sample pipeline uses third party software.
You can download the Third Party Licenses
.