NVIDIA NIM Operator

About the Operator

NVIDIA NIM Operator enables cluster administrators to operate the software components and services that are necessary to run LLM, embedding, and other models with NVIDIA NIM microservices in Kubernetes.

The Operator manages the life cycle of the following microservices and the models they use:

  • NVIDIA NIM for LLMs

  • NeMo Retriever Text Embedding NIM

  • NeMo Retriever Text Reranking NIM

NVIDIA provides a sample multi-turn RAG pipeline. The pipeline deploys a chat bot web application and a chain server. The chain server communicates with the NIM microservices and a vector database.

Benefits of Using the Operator

Using the NIM Operator simplifies the operation and lifecycle management of NIM microservices at scale and at the cluster level. Custom resources simplify the deployment and lifecycle management of multiple AI inference pipelines, such as RAG and multiple LLM inferences. Additionally, the NIM Operator supports caching models to reduce the initial inference latency and enable auto-scaling.

The Operator uses the following custom resources:

  • nimcaches.apps.nvidia.com

    This custom resource enables downloading models from NVIDIA NGC and persisting them on network storage. One advantage to caching a model is that when multiple instances of the same NIM microservice start, the microservices use the single cached model. However, caching is optional. Without caching, each NIM microservice instance downloads a copy of the model when it starts.

  • nimservices.apps.nvidia.com

    This custom resource represents a NIM microservice. Adding and updating a NIM service resource creates a Kubernetes deployment for the microservice in a namespace.

    The custom resource supports using a model from a existing NIM cache resource or a persistent volume claim.

    The custom resource also supports creating a horizontal pod autoscaler, ingress, and service monitor to simplify cluster administration.

  • nimpipelines.apps.nvidia.com

    This custom resource represents a group of NIM service custom resources.

Limitations

The Operator has the following limitations:

  • If all the GPUs in your cluster are allocated and you change a custom resource that requires starting a new pod that requires access to a GPU device, the new pods become stuck in a Pending state because Kubernetes cannot schedule the new pods to a node with an allocatable GPU resource.

Licences

The following table identifies the licenses for the software components related to the Operator.

Component

Artifact Type

Artifact Licenses

Source Code License

NVIDIA NIM Operator

Helm Chart

NVIDIA AI Enterprise Software License Agreement

Apache 2

NVIDIA NIM Operator

Image

NVIDIA AI Enterprise Software License Agreement

Apache 2

NVIDIA NIM for LLMs

Container

NVIDIA AI Enterprise Software License Agreement

None

NVIDIA NeMo Retriever Text Embedding NIM

Container

NVIDIA AI Enterprise Software License Agreement

None

NVIDIA NeMo Retriever Text Reranking NIM

Container

NVIDIA AI Enterprise Software License Agreement

None

The is accessible to early-access participants only.

Third Party Software

The Chain Server that you can deploy with the sample pipeline uses third party software. You can download the Third Party Licenses.