NeMo Deployment Management Setup Guide#
Find more information on prerequisites and configuring the key parts of the values.yaml
file in the NeMo Deployment Management Helm chart.
Prerequisites#
The following are the prerequisites for installing NeMo Deployment Management.
NVIDIA NIM Operator for deploying NIMs.
Helm Values File#
The following code snippet shows the essential part you should include in the default values file for the NeMo Deployment Management Helm chart.
Note
Due to a known issue in NIM Operator, you must specify the storage class to the deployments.defaultStorageClass
property.
imagePullSecrets:
- name: nvcrimagepullsecret
image:
repository: nvidia/nemo-microservices/nemo-deployment-management
tag: 25.6.0
deployments:
defaultStorageClass: <STORAGE_CLASS>
nimImagePullSecrets: ["nvcrimagepullsecret"]
Use the deployments.nimImagePullSecrets
property to specify the image pull secrets that you want to inject into NIMs. In the preceding example, this property includes the general image pull secret nvcrimagepullsecret
created from Prerequisites for Installing Helm Charts from NGC Catalog. You can also add your own secrets for image pull to this property.
Additional Configuration Tips#
The following are additional configuration tips for the NeMo Deployment Management Helm chart.
Enable Metrics Collection#
The NeMo Deployment Management Helm chart includes metrics collection configuration for NIMs. This is turned off by default.
To enable metrics collection from model endpoints, set the metrics.enabled
property to true
.
deployments:
metrics:
enabled: false
When enabled, the NeMo Deployment Management microservice deploys NIMs with metrics collection. This creates a serviceMonitor custom resource, which requires the Prometheus operator. You can collect metrics through other methods, including using Prometheus differently or configuring a podMonitor.
The kube-prometheus-stack
Helm chart configures the Prometheus operator. Multiple methods exist to configure Prometheus monitoring. NIM supports using Open Telemetry.
Horizontal Pod Autoscaling#
The NeMo Deployment Management microservice includes configuration for horizontal pod autoscaling of NIMs. By default, autoscaling is turned off.
The following manifest snippet shows the default values of the deployments
object for horizontal pod autoscaler configuration.
To enable horizontal pod autoscaling, set deployments.autoscaling.enabled
to true
.
The default autoscaling spec is gpu_cache_usage_perc
, which is a metric that the GPU operator provides. Therefore, to enable the autoscaler with the default gpu_cache_usage_perc
metric, you also need to install GPU operator.
You can override the autoscaling.spec
values with values you want through the values file for NeMo Deployment Management chart installation.
deployments:
autoscaling:
enabled: true
spec:
maxReplicas: 5
metrics:
- pods:
metric:
name: gpu_cache_usage_perc
target:
averageValue: 750m
type: AverageValue
type: Pods
minReplicas: 1
This sets spec.scale.enabled
to true
and overrides the scale.hpa
object in the NIM Services CR.