Deploying with Helm#
NIMs are intended to be run on a system with NVIDIA GPUs, with the type and number of GPUs depending on the model. To use helm, you must have a Kubernetes cluster with the appropriate GPU nodes and the GPU Operator installed.
Prerequisites#
If you have not set up your NGC API key and do not know exactly which NIM you want to download and deploy, refer to Getting Started for more information.
Once you have set your NGC API key, go to the NGC Catalog and select the riva-nim helm chart to pick a version. In most cases, you should select the latest version.
Use the following command to download the helm chart:
helm fetch https://helm.ngc.nvidia.com/nim/charts/riva-nim-<version_number>.tgz --username=\$oauthtoken --password=$NGC_API_KEY
This downloads the chart as a file to your local machine.
Configuring Helm#
The following Helm options are the most important to configure when deploying a NIM using Kubernetes:
image.repository
: Specifies the container/NIM to deploy.image.tag
: Defines the version of that container/NIM.Storage options: Configured based on the environment and cluster in use.
model.ngcAPISecret
andimagePullSecrets
: Needed to communicate with NGC.envVars
: An array of environment variables provided to the container. Use this if advanced configuration is needed.Note: Do not set the following environment variables using the
env
value. Instead, use the Helm options specified in the following table:Environment Variable
Helm Value
NIM_CACHE_PATH
model.nimCache
NGC_API_KEY
model.ngcAPISecret
NIM_SERVER_PORT
model.openaiPort
NIM_JSONL_LOGGING
model.jsonLogging
NIM_LOG_LEVEL
model.logLevel
In these cases, set the Helm values directly instead of relying on the environment variable values. You can add other environment variables to the
envVars
section of a values file.
To adapt the chart’s deployment behavior to your cluster’s needs, refer to the Helm chart’s README, which lists and describes the configuration options. This README is available on the Helm command line, but the output is bare markdown. Output it to a file and open it with a markdown renderer or use a command line tool such as glow to render in the terminal.
The following Helm command displays the chart README and renders it in the terminal using glow
:
helm show readme riva-nim-<version_number>.tgz | glow -p -
To examine all default values, run the following command:
helm show values riva-nim-<version_number>.tgz
Minimal Example#
This example requires that you have already established certain Kubernetes secrets in the deployment namespace to work before proceeding. The rest of this document will assume the default namespace.
To download the NIM container image, you must set an image pull secret, which is ngc-secret
in the following example. To download model engines or weights from NGC, the chart requires a generic secret that has an NGC API key as a value stored in a key named NGC_API_KEY
. The following example creates these two values:
kubectl create secret docker-registry ngc-secret --docker-server=nvcr.io --docker-username='$oauthtoken' --docker-password=$NGC_API_KEY
kubectl create secret generic ngc-api --from-literal=NGC_API_KEY=$NGC_API_KEY
Create the file custom-values.yaml with the following entries. These values will work in most clusters after the secrets are created above.
The following values deploy the Magpie TTS Multilingual model.
image:
repository: nvcr.io/nim/nvidia/magpie-tts-multilingual
pullPolicy: IfNotPresent
# Tag overrides the image tag whose default is the chart appVersion.
tag: latest
nim:
ngcAPISecret: ngc-api # name of a secret in the cluster that includes a key named NGC_API_KEY and is an NGC API key
imagePullSecrets:
- name: ngc-secret # name of a secret used to pull nvcr.io images, see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
envVars:
NIM_TAGS_SELECTOR: name=magpie-tts-multilingual # select required profile from available profiles
You can adapt the previous configuration to deploy other models, such as Fastpitch HifiGAN en-US, by updating image
and envVars
appropriately. For example:
image:
repository: nvcr.io/nim/nvidia/riva-tts # container location -- changed for the different model
pullPolicy: IfNotPresent
tag: latest
nim:
ngcAPISecret: ngc-api
imagePullSecrets:
- name: ngc-secret
envVars:
NIM_TAGS_SELECTOR: name=fastpitch-hifigan-en-us # select required profile from available profiles
Launching NIM in Kubernetes#
You are now ready to launch the chart.
helm install riva-nim riva-nim-<version_number>.tgz -f path/to/your/custom-values.yaml
Wait until the pod status changes to READY
. You can check the pod’s status using the following command:
kubectl get pod
Once the pod is ready, the command will show the following output:
NAME READY STATUS RESTARTS AGE
dnsutils 1/1 Running 0 35h
riva-nim-566d4d4d7c-5f24x 1/1 Running 0 7m44s
Running Inference#
Section Running Inference shows how to run sample inference using the Riva NIM service. Sample inference commands assume that the Riva NIM service is available on the local machine and use default IP of 0.0.0.0
. In order to use riva-nim
service locally, we need to forward local port to Riva NIM service port.
kubectl port-forward services/riva-nim 50051:50051 9000:9000
Note
kubectl port-forward does not return. To run inference with sample clients, you will need to open another terminal.
Alternatively, you can query Riva NIM service IP using kubectl get service riva-nim
command and replace 0.0.0.0
by riva-nim
service IP in the client commands.
Storage#
Running out of storage space is always a concern when setting up NIMs, and downloading models can delay scaling in a cluster. Models can be quite large, and a cluster operator can quickly fill disk space when downloading them. Be sure to mount some type of persistent storage for the model cache on your pod. You have the following mutually-exclusive options when storing objects outside of the default of an emptyDir
:
Persistent Volume Claims (enabled with
persistence.enabled
)Used when
persistence.accessMode
is set to “ReadWriteMany” where several pods can share one PVC.If
statefulSet.enabled
is set tofalse
(default istrue
), this will create a PVC with a deployment, but if the access mode is notReadWriteMany
, such as with an NFS provisioner, scaling beyond one pod will likely fail.
Persistent Volume Claim templates (enabled with
persistence.enabled
and leavingstatefulSet.enabled
as default)Useful for scaling using a strategy of scaling up the StatefulSet to download the model to each PVC created for a maximum replicas desired, and then scaling down again, leaving those PVCs in place to allow fast scaling up.
Direct NFS (enabled with
nfs.enabled
)Kubernetes does not allow setting of mount options on direct NFS, so some special cluster setup may be required.
hostPath (enabled with
hostPath.enabled
)Know the security implications of using hostPath and understand that this will also tie pods to one node.
Security#
NIM can optionally be deployed with SSL/TLS certificates by adding below parameters to the custom-values.yaml. You can generate SSL certificates for secure communication from any trusted source. Refer configuration for information about each of the environment variable mentioned below.
hostPath:
enabled: true
path: <local-path-to-ssl-key>
extraVolumeMounts:
ssl-mount: # Define a unique mount name
mountPath: /opt/nim/crt
readOnly: true
extraVolumes:
ssl-mount: # Use the same name as in extraVolumeMounts
hostPath:
path: <local-path-to-ssl-certificates>
env:
- name: NIM_SSL_MODE
value: "tls"
- name: NIM_SSL_CA_PATH
value: "/opt/nim/crt/ssl_ca_cert.pem"
- name: NIM_SSL_CERT_PATH
value: "/opt/nim/crt/ssl_cert_server.pem"
- name: NIM_SSL_KEY_PATH
value: "/opt/nim/crt/ssl_key_server.pem"
resources:
limits:
nvidia.com/gpu: 1
Troubleshooting FAQ#
Q: What should I do if my pod is stuck in a “Pending” state?
A: Try running kubectl describe pod <pod name>
, and check the Events
section to see what the scheduler is waiting for. Node taints that may need to be tolerated, insufficient GPUs, and storage mount issues are all common reasons.
Q: I tried to scale or upgrade a deployment using statefulset.enabled: false
and persistence.enabled: true
. Why are pods never starting?
A: To scale or upgrade without using StatefulSet
PVC templates, which is not very efficient in either time or storage, you must use a ReadWriteMany
storage class so that it can be mounted on separate nodes, manually cloned ReadOnlyMany
volumes or something like direct NFS storage. Without persistence, every starting pod must download its model to an emptyDir
volume. A ReadWriteMany
storage class such NFS PVC provisioner or CephFS provisioner is ideal.
Additional Information#
The helm chart’s internal README includes the following parameters. NVIDIA recommends that you use the chart version within the downloaded README as it has the most correct and up to date version of these parameters for that chart version.
Parameters#
Deployment parameters#
Name |
Description |
Value |
---|---|---|
|
[default: {}] Affinity settings for deployment. |
|
|
Sets privilege and access control settings for container (Only affects the main container, not pod-level) |
|
|
Overrides command line options sent to the NIM with the array listed here. |
|
|
Overrides command line arguments of the NIM container with the array listed here. |
|
|
Specifies the egress endpoints for the service which are added as an env var with prefix ‘egress_’ |
|
|
Adds arbitrary environment variables to the main container using key-value pairs, for example NAME: value |
|
|
Adds arbitrary additional volumes to the deployment set definition |
|
|
Specify volume mounts to the main container from |
|
|
NIM Image Repository |
|
|
Image tag or version |
|
|
Image pull policy |
|
|
Specify list of secret names that are needed for the main container and any init containers. |
|
|
Specifies the ingress endpoints for the service which are added as an env var with prefix ‘ingress_’ |
|
|
Specify init containers, if needed. |
|
|
Sets node selectors for the NIM – for example |
|
|
Specifies the params mentioned in a separate config file inside nim-workspace dir which will be added as env var. |
|
|
Sets additional annotations on the main deployment pods |
|
|
Specify privilege and access control settings for pod |
|
|
Specify user UID for pod. |
|
|
Specify group ID for pod. |
|
|
Specify file system owner group id. |
|
|
Specify static replica count for deployment. |
|
|
Specify resources limits and requests for the running service. |
|
|
Specify number of GPUs to present to the running service. |
|
|
Specifies whether a service account should be created. |
|
|
Sets annotations to be added to the service account. |
|
|
Specifies the name of the service account to use. If it is not set and create is |
|
|
Enables |
|
|
Specify tolerations for pod assignment. Allows the scheduler to schedule pods with matching taints. |
Autoscaling parameters#
Values used for creating a Horizontal Pod Autoscaler
. If autoscaling is not enabled, the rest are ignored.
NVIDIA recommends usage of the custom metrics API, commonly implemented with the prometheus-adapter.
Standard metrics of CPU and memory are of limited use in scaling NIM.
Name |
Description |
Value |
---|---|---|
|
Enables horizontal pod autoscaler. |
|
|
Specify minimum replicas for autoscaling. |
|
|
Specify maximum replicas for autoscaling. |
|
|
Array of metrics for autoscaling. |
|
Ingress parameters#
Name |
Description |
Value |
---|---|---|
|
Enables ingress. |
|
|
Specify class name for Ingress. |
|
|
Specify additional annotations for ingress. |
|
|
Specify list of hosts each containing lists of paths. |
|
|
Specify name of host. |
|
|
Specify ingress path. |
|
|
Specify path type. |
|
|
Specify list of pairs of TLS |
|
Probe parameters#
Name |
Description |
Value |
---|---|---|
|
Enables `livenessProbe`` |
|
|
`LivenessProbe`` endpoint path |
|
|
Initial delay seconds for |
|
|
Timeout seconds for |
|
|
Period seconds for |
|
|
Success threshold for |
|
|
Failure threshold for |
|
|
Enables |
|
|
Readiness Endpoint Path |
|
|
Initial delay seconds for |
|
|
Timeout seconds for |
|
|
Period seconds for |
|
|
Success threshold for |
|
|
Failure threshold for |
|
|
Enables |
|
|
|
|
|
Initial delay seconds for |
|
|
Timeout seconds for |
|
|
Period seconds for |
|
|
Success threshold for |
|
|
Failure threshold for |
|
Metrics parameters#
Name |
Description |
Value |
---|---|---|
|
For NIMs with a separate metrics port, this opens that port on the container |
|
|
Options for |
|
|
Enables |
|
|
Specify additional labels for ServiceMonitor. |
|
NIM parameters#
Name |
Description |
Value |
---|---|---|
|
Path to mount writeable storage or pre-filled model cache for the NIM |
|
|
Optionally specifies the name of the model in the API. This can be used in helm tests. |
|
|
Name of pre-existing secret with a key named |
|
|
NGC API key literal to use as the API secret and image pull secret when set |
|
|
Specify Server Port. |
|
|
Specify HTTP Port. |
|
|
Specify GRPC Port. |
|
|
Specify extra labels to be add to on deployed pods. |
|
|
Whether to enable JSON lines logging. Defaults to true. |
|
|
Log level of NIM service. Possible values of the variable are TRACE, DEBUG, INFO, WARNING, ERROR, CRITICAL. |
|
Storage parameters#
Name |
Description |
Value |
---|---|---|
|
Specify settings to modify the path |
|
|
Enables the use of persistent volumes. |
|
|
Specifies an existing persistent volume claim. If using |
|
|
Specifies the persistent volume storage class. If set to |
|
|
Specify |
|
|
Specifies persistent volume claim retention policy when deleted. Only used with Stateful Set volume templates. |
|
|
Specifies persistent volume claim retention policy when scaled. Only used with Stateful Set volume templates. |
|
|
Specifies the size of the persistent volume claim (for example 40Gi). |
|
|
Adds annotations to the persistent volume claim. |
|
|
Configures model cache on local disk on the nodes using |
|
|
Enable |
|
|
Specifies path on the node used as a |
|
|
Configures the model cache to sit on shared direct-mounted NFS. NOTE: you cannot set mount options using direct NFS mount to pods without a node-intalled nfsmount.conf. An NFS-based |
|
|
Enable direct pod NFS mount |
|
|
Specify path on NFS server to mount |
|
|
Specify NFS server address |
|
|
Set to true to mount as read-only |
|
Service parameters#
Name |
Description |
Value |
---|---|---|
|
Specifies the service type for the deployment. |
|
|
Overrides the default service name |
|
|
Specifies the Server Port for the service. |
|
|
Specifies the HTTP Port for the service. |
|
|
Specifies the GRPC Port for the service. |
|
|
Specifies the metrics port on the main service object. Some NIMs do not use a separate port. |
|
|
Specify additional annotations to be added to service. |
|
|
Specifies additional labels to be added to service. |
|