NeMo Deployment Management Setup Guide#
Find more information on prerequisites and configuring the key parts of the values.yaml
file in the NeMo Deployment Management Helm chart.
Prerequisites#
The following are the prerequisites for installing NeMo Deployment Management.
NVIDIA NIM Operator for deploying NIMs.
Basic Configuration#
The following code snippet shows the essential part you should include in the default values file for the NeMo Deployment Management Helm chart.
Note
Due to a known issue in NIM Operator, you must specify the storage class to the deployments.defaultStorageClass
property.
imagePullSecrets:
- name: nvcrimagepullsecret
image:
repository: nvidia/nemo-microservices/nemo-deployment-management
tag: 25.9.0
deployments:
defaultStorageClass: <STORAGE_CLASS>
nimImagePullSecrets: ["nvcrimagepullsecret"]
Use the deployments.nimImagePullSecrets
property to specify the image pull secrets that you want to inject into NIMs. In the preceding example, this property includes the general image pull secret nvcrimagepullsecret
created from Prerequisites for Installing Helm Charts from NGC Catalog. You can also add your own secrets for image pull to this property.
Additional Configuration Tips#
The following are additional configuration tips for the NeMo Deployment Management Helm chart.
Enable Metrics Collection#
The NeMo Deployment Management Helm chart includes metrics collection configuration for NIMs. This is turned off by default.
To enable metrics collection from model endpoints, set the metrics.enabled
property to true
.
deployment-management:
deployments:
metrics:
enabled: true
When enabled, the NeMo Deployment Management microservice deploys NIMs with metrics collection. This creates a serviceMonitor custom resource, which requires the Prometheus operator. You can collect metrics through other methods, including using Prometheus differently or configuring a podMonitor.
The kube-prometheus-stack
Helm chart configures the Prometheus operator. Multiple methods exist to configure Prometheus monitoring. NIM supports using Open Telemetry.
Observability for NIM for LLMs Microservices#
To visualize the metrics collected from the NIM for LLMs microservices using Prometheus and Grafana, refer to the following references:
Horizontal Pod Autoscaling of NIM Microservices#
The NeMo Deployment Management microservice includes configuration for horizontal pod autoscaling of NIM microservices. By default, autoscaling is turned off.
The following manifest snippet shows the default values of the deployments
object for the horizontal pod autoscaler configuration of NIM microservices when using the NeMo Deployment Management microservice.
To enable horizontal pod autoscaling, set deployments.autoscaling.enabled
to true
.
The default autoscaling spec is gpu_cache_usage_perc
, which is a metric provided by the GPU operator. To enable the autoscaler with the default gpu_cache_usage_perc
metric, you need to install the GPU operator. Additionally, you need Prometheus, a Prometheus operator, and scraping of your NIM microservices. If you prefer to use default resources, CPU usage can serve as a rough approximation for very small models.
You can override the autoscaling.spec
values with values you want through the values file for NeMo Deployment Management chart installation.
deployment-management:
deployments:
autoscaling:
enabled: true
spec:
maxReplicas: 5
metrics:
- pods:
metric:
name: gpu_cache_usage_perc
target:
averageValue: 750m
type: AverageValue
type: Pods
minReplicas: 1
This sets spec.scale.enabled
to true
and overrides the scale.hpa
object in the NIM Service custom resources.
For more information about the scaling parameters of the NIM for LLMs microservices, refer to the horizontal pod autoscaler parameters section in the NIM for LLMs documentation.