Title: NIM Operator Support — UCS Tools Documentation

URL Source: https://docs.nvidia.com/ucf/2.10.0/text/NIM_Operator.html

Published Time: Thu, 30 Oct 2025 07:23:03 GMT

Markdown Content:
NIM Operator Support[#](https://docs.nvidia.com/ucf/2.10.0/text/NIM_Operator.html#nim-operator-support "Link to this heading")
------------------------------------------------------------------------------------------------------------------------------

Note

Support for the NIM Operator is available in UCS Tools starting with beta release 2.11.0-rc1.

The [NIM-Operator](https://docs.nvidia.com/nim-operator/latest/index.html.md) deploys NIMs to Kubernetes by applying Custom Resource Definitions (CRDs) to create deployments and services for given NIMs. UCS Tools supports the NIM Operator in blueprint (application) definitions. Behind the scenes, UCS Tools creates a NIMPipeline CRD that the operator understands to deploy one or more NIMs.

Consider the [VSS blueprint](https://gitlab-master.nvidia.com/via/via-nim-blueprint/-/blob/main/via-blueprint/via-blueprint.yaml?ref_type=heads) as an example (see [VSS documentation](https://via.gitlab-master-pages.nvidia.com/via-docs/content/run_via.html#deploy-using-helm) for more details). The application YAML configuration is as follows:

specVersion: '2.5.0'

version: 2.3.0

doc: README.md

name: nvidia-blueprint-vss

description: Video Search and Summarization Agent Blueprint

dependencies:
- ucf.svc.vss:2.3.0
- ucf.svc.etcd:2.1.0
- ucf.svc.minio:2.1.0
- ucf.svc.milvus:2.1.0
- ucf.svc.neo4j:2.1.0
- ucf.svc.riva:2.3.0

components:
- name: vss
 type: ucf.svc.vss
 parameters:
 vlmModelType: vila-1.5
 vlmModelPath: ngc:nim/nvidia/vila-1.5-40b:vila-yi-34b-siglip-stage3_1003_video_v8
 llmModel: meta/llama-3.3-70b-instruct
 llmModelChat: meta/llama-3.3-70b-instruct
 imagePullSecrets:
 - name: ngc-docker-reg-secret
 resources:
 limits:
 nvidia.com/gpu: 2

 secrets:
 # openai-api-key: openai-api-key
 #nvidia-api-key: nvidia-api-key
 ngc-api-key: ngc-api-key
 graph-db-username: graph-db-username
 graph-db-password: graph-db-password

- name: etcd
 type: ucf.svc.etcd
- name: minio
 type: ucf.svc.minio
- name: milvus
 type: ucf.svc.milvus

- name: neo4j
 type: ucf.svc.neo4j
 secrets:
 db-username: graph-db-username
 db-password: graph-db-password

- name: riva
 type: ucf.svc.riva
 parameters:
 enabled: false
 imagePullSecrets:
 - name: ngc-docker-reg-secret

- name: rag
 type: nim-operator
 parameters:
 services:
 - name: llm-nim
 enabled: true
 spec:
 metrics:
 enabled: true
 serviceMonitor:
 interval: 15s
 scrapeTimeout: 6s
 image:
 repository: nvcr.io/nim/meta/llama-3.1-8b-instruct
 tag: 1.3.3
 pullPolicy: IfNotPresent
 pullSecrets:
 - ngc-docker-reg-secret
 authSecret: ngc-api-key-secret
 storage:
 pvc:
 create: true
 storageClass: microk8s-hostpath
 name: meta-llama
 volumeAccessMode: ReadWriteMany
 size: 10Gi
 replicas: 1
 resources:
 limits:
 nvidia.com/gpu: 2
 expose:
 service:
 type: ClusterIP
 port: 8000
 scale:
 enabled: true
 hpa:
 maxReplicas: 2
 minReplicas: 1
 metrics:
 - type: Object
 object:
 metric:
 name: gpu_cache_usage_perc
 describedObject:
 apiVersion: v1
 kind: Service
 name: llm-nim
 target:
 type: Value
 value: '0.3'
 - name: nemo-embedding
 enabled: true
 spec:
 image:
 repository: nvcr.io/nim/nvidia/llama-3.2-nv-embedqa-1b-v2
 tag: 1.3.1
 pullPolicy: IfNotPresent
 pullSecrets:
 - ngc-docker-reg-secret
 authSecret: ngc-api-key-secret
 storage:
 pvc:
 create: true
 storageClass: microk8s-hostpath
 name: nemo-embedding
 volumeAccessMode: ReadWriteMany
 size: 10Gi
 replicas: 1
 resources:
 limits:
 nvidia.com/gpu: 1
 expose:
 service:
 type: ClusterIP
 port: 8000
 - name: nemo-rerank
 enabled: true
 spec:
 image:
 repository: nvcr.io/nim/nvidia/llama-3.2-nv-rerankqa-1b-v2
 tag: 1.3.1
 pullPolicy: IfNotPresent
 pullSecrets:
 - ngc-docker-reg-secret
 authSecret: ngc-api-key-secret
 storage:
 pvc:
 create: true
 storageClass: microk8s-hostpath
 name: nemo-rerank
 volumeAccessMode: ReadWriteMany
 size: 10Gi
 replicas: 1
 resources:
 limits:
 nvidia.com/gpu: 1
 expose:
 service:
 type: ClusterIP
 port: 8000

connections:
 milvus/etcd: etcd/http-api
 milvus/minio: minio/http-api
 vss/milvus: milvus/http-api1 # port 19530
 vss/neo4j-bolt: neo4j/bolt
 vss/llm-openai-api: rag/llm-nim
 vss/nemo-embed: rag/nemo-embedding
 vss/nemo-rerank: rag/nemo-rerank
 vss/riva-api: riva/http-api

secrets:
 # openai-api-key:
 # k8sSecret:
 # secretName: openai-api-key-secret
 # key: OPENAI_API_KEY
 #nvidia-api-key:
 # k8sSecret:
 # secretName: nvidia-api-key-secret
 # key: NVIDIA_API_KEY
 ngc-api-key:
 k8sSecret:
 secretName: ngc-api-key-secret
 key: NGC_API_KEY
 graph-db-username:
 k8sSecret:
 secretName: graph-db-creds-secret
 key: username
 graph-db-password:
 k8sSecret:
 secretName: graph-db-creds-secret
 key: password

The NIM Operator is represented by the built-in component named `nim-operator` (which is used as a component in the application above with the name “rag”). In this example, the NIM Operator is configured to deploy three NIMs, where each NIM in the NIMPipeline definition is represented by a NIMService CRD (also part of the NIM Operator set of CRDs):

1.   Nemo Embedding

2.   Nemo Reranking

3.   LLM NIM

Each of these NIMService objects has an associated Kubernetes service defined in the `spec.expose.service` field. The service name will be the same as the name of the associated NIMService.

At the bottom of the configuration, the VSS client application makes three connections to the NIMService objects defined by the NIM Operator:

1.   vss/llm-openai-api: rag/llm-nim

2.   vss/nemo-embed: rag/nemo-embedding

3.   vss/nemo-rerank: rag/nemo-rerank

To make a connection to a NIM Operator service, use the name of the operator component in the application (“rag” in this case) and the Kubernetes service name.

Building this application with the command:

ucf_app_builder_cli app build via-nim-blueprint/via-blueprint/via-blueprint.yaml

generates the output blueprint Helm chart named `nvidia-blueprint-vss-2.3.0`. The NIM Operator’s NIMPipeline CRD manifest will be output to the “templates” subdirectory at `nvidia-blueprint-vss-2.3.0/templates/nvidia-nim-operator-pipeline.yaml`.

Prerequisites for Deploying Your Blueprint[#](https://docs.nvidia.com/ucf/2.10.0/text/NIM_Operator.html#prerequisites-for-deploying-your-blueprint "Link to this heading")
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

### GPU Resources[#](https://docs.nvidia.com/ucf/2.10.0/text/NIM_Operator.html#gpu-resources "Link to this heading")

You’ll need at least 4 GPUs, although the current configuration is designed for 6. This configuration has been tested on A100 GPUs and will also work with H100 GPUs.

The llm-nim NIM is configured to use 2 GPUs, but you can reduce this to 1. Similarly, the VSS NIMService is also configured for 2 GPUs, but 1 should be sufficient for non-intensive workloads. The Nemo Rerank and NeMo Embedding NIMs are each configured with 1 GPU.

For HPA scaling to work properly, you’ll need an extra GPU set aside (as HPA is configured with minReplicas equal to 1 and maxReplicas equal to 2). Therefore, you’ll want to deploy on a system with at least 5 GPUs if llm-nim and VSS are both set to use 1 GPU, or at least 7 GPUs if using the default GPU counts.

### NIM Operator[#](https://docs.nvidia.com/ucf/2.10.0/text/NIM_Operator.html#id1 "Link to this heading")

Before using Helm to install the blueprint, ensure that the NIM Operator is running in your Kubernetes cluster. See the [NIM-Operator documentation](https://docs.nvidia.com/nim-operator/latest/index.html.md) for installation instructions.

### Prometheus Kubernetes Stack[#](https://docs.nvidia.com/ucf/2.10.0/text/NIM_Operator.html#prometheus-kubernetes-stack "Link to this heading")

The [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) Helm chart is available in the prometheus-community GitHub repository. This includes Prometheus, the Prometheus Operator (which provides CRDs such as ServiceMonitor, PodMonitor, etc.), and Grafana.

If you are using a single-node Kubernetes setup with MicroK8s, you can run `microk8s enable observability`, which will deploy this stack for you.

### Updating the Prometheus Configuration[#](https://docs.nvidia.com/ucf/2.10.0/text/NIM_Operator.html#updating-the-prometheus-configuration "Link to this heading")

After installing the Prometheus Kubernetes stack, you should update the “Prometheus” CRD, which contains the configuration for Prometheus. We recommend updating the `serviceMonitorSelector` field and setting it to `{}`. By default, at least for MicroK8s, it’s set to:

serviceMonitorSelector:
 matchLabels:
 release: kube-prometheus-stack

Changing it to:

serviceMonitorSelector: {}

means that it will select any ServiceMonitor resource, not just those with the label `release: kube-prometheus-stack`. You might also want to verify that this is set to select ServiceMonitors in all namespaces:

serviceMonitorNamespaceSelector: {}

You can update the settings similarly for the `podMonitorNamespaceSelector` and `podMonitorSelector` fields.

### Prometheus Adapter[#](https://docs.nvidia.com/ucf/2.10.0/text/NIM_Operator.html#prometheus-adapter "Link to this heading")

The Prometheus Adapter is used to provide the custom metrics API. The HPA configured earlier for the llm-nim service in the NIM Operator manifest will query the custom metrics API server to determine when to scale the llm-nim pods.

Install the prometheus-adapter Helm chart from the prometheus-community GitHub repository:

helm install prometheus-adapter prometheus-community/prometheus-adapter --set-literal=prometheus.url=http://<prometheus-service-name>.<prometheus-namespace>.svc

Take special care to override the `prometheus.url` Helm chart value of the Prometheus Adapter, as the default is `http://prometheus.default.svc`, as indicated in the chart’s [values.yaml](https://github.com/prometheus-community/helm-charts/blob/main/charts/prometheus-adapter/values.yaml) file. If the `prometheus.url` field is not configured properly at deployment time, the HPA resource for the llm-nim in the example above will never be able to determine the current metric usage it’s configured to monitor, as it relies on the Prometheus Adapter’s ability to ingest Prometheus metrics (which the adapter exposes via the custom metrics API).

Deploying the VSS Blueprint[#](https://docs.nvidia.com/ucf/2.10.0/text/NIM_Operator.html#deploying-the-vss-blueprint "Link to this heading")
--------------------------------------------------------------------------------------------------------------------------------------------

When the blueprint was built using UCS Tools earlier, it generated the blueprint Helm chart folder named `nvidia-blueprint-vss-2.3.0`. You can deploy this chart in your Kubernetes environment as follows:

helm install nvidia-blueprint-vss nvidia-blueprint-vss-2.3.0 --namespace nvidia-blueprint-vss

### Interacting with the VSS Blueprint via the VIA Python Client CLI[#](https://docs.nvidia.com/ucf/2.10.0/text/NIM_Operator.html#interacting-with-the-vss-blueprint-via-the-via-python-client-cli "Link to this heading")

Use the [VIA Python client CLI](https://via.gitlab-master-pages.nvidia.com/via-docs/content/python_client.html) to upload images or videos and make requests to summarize them.

Prometheus Dashboards[#](https://docs.nvidia.com/ucf/2.10.0/text/NIM_Operator.html#prometheus-dashboards "Link to this heading")
--------------------------------------------------------------------------------------------------------------------------------

Because you have access to a Grafana dashboard via the kube-prometheus-stack that was installed earlier, there are several dashboards you can explore.

### NIM Example Dashboard[#](https://docs.nvidia.com/ucf/2.10.0/text/NIM_Operator.html#nim-example-dashboard "Link to this heading")

See [this section](https://docs.nvidia.com/nim/large-language-models/latest/observability.html.md#grafana) of the NIM LLM documentation for instructions on accessing and installing the NIM Dashboard JSON file.

### DCGM Dashboard[#](https://docs.nvidia.com/ucf/2.10.0/text/NIM_Operator.html#dcgm-dashboard "Link to this heading")

The NVIDIA Data Center GPU Manager Exporter (DCGM-Exporter) is installed as part of the GPU Operator. There is a Kubernetes service for DCGM-Exporter that exports GPU metrics on the `/metrics` endpoint. Below is an example of calling its `/metrics` endpoint manually in a MicroK8s environment:

kubectl port-forward service/nvidia-dcgm-exporter 9400:9400 -n gpu-operator-resources
curl 10.152.183.130:9400/metrics

Based on the Prometheus configuration we made earlier, the DCGM-Exporter will be scraped by Prometheus (there is a ServiceMonitor resource in the gpu-operator-resources namespace named nvidia-dcgm-exporter). In Grafana, you can install the [DCGM-Exporter dashboard](https://grafana.com/grafana/dashboards/12239-nvidia-dcgm-exporter-dashboard/) to visualize these metrics directly in Grafana.

Links/Buttons:
- [#](https://docs.nvidia.com/ucf/2.10.0/text/NIM_Operator.html#dcgm-dashboard)
- [NIM-Operator](https://docs.nvidia.com/nim-operator/latest/index.html.md)
- [VSS blueprint](https://gitlab-master.nvidia.com/via/via-nim-blueprint/-/blob/main/via-blueprint/via-blueprint.yaml?ref_type=heads)
- [VSS documentation](https://via.gitlab-master-pages.nvidia.com/via-docs/content/run_via.html#deploy-using-helm)
- [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack)
- [values.yaml](https://github.com/prometheus-community/helm-charts/blob/main/charts/prometheus-adapter/values.yaml)
- [VIA Python client CLI](https://via.gitlab-master-pages.nvidia.com/via-docs/content/python_client.html)
- [this section](https://docs.nvidia.com/nim/large-language-models/latest/observability.html.md#grafana)
- [DCGM-Exporter dashboard](https://grafana.com/grafana/dashboards/12239-nvidia-dcgm-exporter-dashboard/)