NeMo Deployment Management Setup Guide#

Find more information on prerequisites and configuring the key parts of the values.yaml file in the NeMo Deployment Management Helm chart.

Prerequisites#

The following are the prerequisites for installing NeMo Deployment Management.

NVIDIA NIM Operator for deploying NIMs.

Helm Values File#

The following code snippet shows the essential part you should include in the default values file for the NeMo Deployment Management Helm chart.

Note

Due to a known issue in NIM Operator, you must specify the storage class to the deployments.defaultStorageClass property.

imagePullSecrets:
   - name: nvcrimagepullsecret
image:
   repository: nvidia/nemo-microservices/nemo-deployment-management
   tag: 25.7.0
deployments:
   defaultStorageClass: <STORAGE_CLASS>
   nimImagePullSecrets: ["nvcrimagepullsecret"]

Use the deployments.nimImagePullSecrets property to specify the image pull secrets that you want to inject into NIMs. In the preceding example, this property includes the general image pull secret nvcrimagepullsecret created from Prerequisites for Installing Helm Charts from NGC Catalog. You can also add your own secrets for image pull to this property.

Additional Configuration Tips#

The following are additional configuration tips for the NeMo Deployment Management Helm chart.

Enable Metrics Collection#

The NeMo Deployment Management Helm chart includes metrics collection configuration for NIMs. This is turned off by default.

To enable metrics collection from model endpoints, set the metrics.enabled property to true.

deployments:
  metrics:
    enabled: false

When enabled, the NeMo Deployment Management microservice deploys NIMs with metrics collection. This creates a serviceMonitor custom resource, which requires the Prometheus operator. You can collect metrics through other methods, including using Prometheus differently or configuring a podMonitor.

The kube-prometheus-stack Helm chart configures the Prometheus operator. Multiple methods exist to configure Prometheus monitoring. NIM supports using Open Telemetry.

Horizontal Pod Autoscaling#

The NeMo Deployment Management microservice includes configuration for horizontal pod autoscaling of NIMs. By default, autoscaling is turned off.

The following manifest snippet shows the default values of the deployments object for horizontal pod autoscaler configuration.

To enable horizontal pod autoscaling, set deployments.autoscaling.enabled to true.

The default autoscaling spec is gpu_cache_usage_perc, which is a metric that the GPU operator provides. Therefore, to enable the autoscaler with the default gpu_cache_usage_perc metric, you also need to install GPU operator.

You can override the autoscaling.spec values with values you want through the values file for NeMo Deployment Management chart installation.

deployments:
  autoscaling:
    enabled: true
    spec:
      maxReplicas: 5
      metrics:
      - pods:
          metric:
            name: gpu_cache_usage_perc
          target:
            averageValue: 750m
            type: AverageValue
        type: Pods
      minReplicas: 1

This sets spec.scale.enabled to true and overrides the scale.hpa object in the NIM Services CR.