NeMo Deployment Management Setup Guide#

Find more information on prerequisites and configuring the key parts of the values.yaml file in the NeMo Deployment Management Helm chart.

Prerequisites#

The following are the prerequisites for installing NeMo Deployment Management.

NVIDIA NIM Operator for deploying NIMs.

Basic Configuration#

The following code snippet shows the essential part you should include in the default values file for the NeMo Deployment Management Helm chart.

Note

Due to a known issue in NIM Operator, you must specify the storage class to the deployments.defaultStorageClass property.

imagePullSecrets:
   - name: nvcrimagepullsecret
image:
   repository: nvidia/nemo-microservices/nemo-deployment-management
   tag: 25.9.0
deployments:
   defaultStorageClass: <STORAGE_CLASS>
   nimImagePullSecrets: ["nvcrimagepullsecret"]

Use the deployments.nimImagePullSecrets property to specify the image pull secrets that you want to inject into NIMs. In the preceding example, this property includes the general image pull secret nvcrimagepullsecret created from Prerequisites for Installing Helm Charts from NGC Catalog. You can also add your own secrets for image pull to this property.

Additional Configuration Tips#

The following are additional configuration tips for the NeMo Deployment Management Helm chart.

Enable Metrics Collection#

The NeMo Deployment Management Helm chart includes metrics collection configuration for NIMs. This is turned off by default.

To enable metrics collection from model endpoints, set the metrics.enabled property to true.

deployment-management:
  deployments:
    metrics:
      enabled: true

When enabled, the NeMo Deployment Management microservice deploys NIMs with metrics collection. This creates a serviceMonitor custom resource, which requires the Prometheus operator. You can collect metrics through other methods, including using Prometheus differently or configuring a podMonitor.

The kube-prometheus-stack Helm chart configures the Prometheus operator. Multiple methods exist to configure Prometheus monitoring. NIM supports using Open Telemetry.

Observability for NIM for LLMs Microservices#

To visualize the metrics collected from the NIM for LLMs microservices using Prometheus and Grafana, refer to the following references:

Horizontal Pod Autoscaling of NIM Microservices#

The NeMo Deployment Management microservice includes configuration for horizontal pod autoscaling of NIM microservices. By default, autoscaling is turned off.

The following manifest snippet shows the default values of the deployments object for the horizontal pod autoscaler configuration of NIM microservices when using the NeMo Deployment Management microservice.

To enable horizontal pod autoscaling, set deployments.autoscaling.enabled to true.

The default autoscaling spec is gpu_cache_usage_perc, which is a metric provided by the GPU operator. To enable the autoscaler with the default gpu_cache_usage_perc metric, you need to install the GPU operator. Additionally, you need Prometheus, a Prometheus operator, and scraping of your NIM microservices. If you prefer to use default resources, CPU usage can serve as a rough approximation for very small models.

You can override the autoscaling.spec values with values you want through the values file for NeMo Deployment Management chart installation.

deployment-management:
  deployments:
    autoscaling:
      enabled: true
      spec:
        maxReplicas: 5
        metrics:
        - pods:
            metric:
              name: gpu_cache_usage_perc
            target:
              averageValue: 750m
              type: AverageValue
          type: Pods
        minReplicas: 1

This sets spec.scale.enabled to true and overrides the scale.hpa object in the NIM Service custom resources.

For more information about the scaling parameters of the NIM for LLMs microservices, refer to the horizontal pod autoscaler parameters section in the NIM for LLMs documentation.