NVIDIA NIM Operator#

About the Operator#

The NVIDIA NIM Operator enables Kubernetes cluster administrators to operate the software components and services necessary to run NVIDIA NIMs in various domains such as reasoning, retrieval, speech, and biology. Additionally, it allows the use of NeMo Microservices to fine-tune, evaluate, or apply guardrails to your models.

The Operator manages the life cycle of the following microservices and the models they use:

  • NVIDIA NIM models, such as:

    • Reasoning LLMs

    • Retrieval - Embedding, Reranking, etc.

    • Speech

    • Biology

  • NeMo core microservices:

    • NeMo Customizer

    • NeMo Evaluator

    • NeMo Guardrails

  • NeMo platform component microservices:

    • NeMo Data Store

    • NeMo Entity Store

Benefits of Using the Operator#

Using the NIM Operator simplifies the operation and lifecycle management of NIM and NeMo microservices at scale and at the cluster level. Custom resources simplify the deployment and lifecycle management of multiple AI inference pipelines, such as RAG and multiple LLM inferences. Additionally, the NIM Operator supports caching models to reduce the initial inference latency and enable auto-scaling.

The Operator uses the following custom resources:

  • nimcaches.apps.nvidia.com

    This custom resource enables downloading models from NVIDIA NGC and persisting them on network storage. One advantage to caching a model is that when multiple instances of the same NIM microservice start, the microservices use the single cached model. However, caching is optional. Without caching, each NIM microservice instance downloads a copy of the model when it starts.

  • nimservices.apps.nvidia.com

    This custom resource represents a NIM microservice. Adding and updating a NIM service resource creates a Kubernetes deployment for the microservice in a namespace.

    The custom resource supports using a model from a existing NIM cache resource or a persistent volume claim.

    The custom resource also supports creating a horizontal pod autoscaler, ingress, and service monitor to simplify cluster administration.

  • nimpipelines.apps.nvidia.com

    This custom resource represents a group of NIM service custom resources.

  • nemodatastore.apps.nvidia.com, nemoentitystore.apps.nvidia.com, nemocustomizer.apps.nvidia.com, nemoevaluator.apps.nvidia.com, nemoguardrails.apps.nvidia.com

    These microservices represent a NeMo Platform microservices that provide a flexible foundation for building AI workflows on your Kubernetes cluster on-prem or in cloud.

Sample Applications#

NVIDIA provides the following sample applications and tutorials for you to explore the NIM Operator and supported workflows.

Licenses#

The following table identifies the licenses for the software components related to the Operator.

Component

Artifact Type

Artifact Licenses

Source Code License

NVIDIA NIM Operator

Helm Chart

NVIDIA AI Enterprise Software License Agreement

Apache 2

NVIDIA NIM Operator

Container

NVIDIA AI Enterprise Software License Agreement

Apache 2

NVIDIA NIM

Container

NVIDIA AI Enterprise Software License Agreement

None

NVIDIA NeMo Retriever Text Embedding NIM

Container

NVIDIA AI Enterprise Software License Agreement

None

NVIDIA NeMo Data Store

Container

NVIDIA AI Enterprise Software License Agreement

None

NVIDIA NeMo Entity Store

Container

NVIDIA AI Enterprise Software License Agreement

None

NVIDIA NeMo Guardrail

Container

NVIDIA AI Enterprise Software License Agreement

None

NVIDIA NeMo Evaluator

Container

NVIDIA AI Enterprise Software License Agreement

None

NVIDIA NeMo Customizer

Container

NVIDIA AI Enterprise Software License Agreement

None

Third Party Software#

The Chain Server that you can deploy with the sample pipeline uses third party software. You can download the Third Party Licenses.