Deploying on Kubernetes#
The Helm chart simplifies NIM deployment on Kubernetes. The chart supports deployment with a variety of possible cluster, GPU and storage confurations. The chart downloads the model and starts up the service to begin running.
NIMs are intended to be run on a system with NVIDIA GPUs, with the type and number of GPUs depending on the model. To use helm, you must have a Kubernetes cluster with appropriate GPU nodes and the GPU Operator installed.
Benefits of Helm Chart Deployment#
Using a helm chart:
Enables using Kubernetes Nodes and horizontally scaling the service
Encapsulates the complexity of running Docker commands directly
Enables monitoring metrics from the NIM
Setting Up the Environment#
If you haven’t set up your NGC API key and do not know exactly which NIM you want to download and deploy, see the information in the User Guide.
This helm chart requires that you have a secret with your NGC API key configured for downloading private images, and one with your NGC API key (below named ngc-api). These will likely have the same key in it, but they will have different formats (dockerconfig.json vs opaque). See Creating Secrets below.
These instructions will assume that you have your NGC_API_KEY
exported in the environment.
export NGC_API_KEY="<YOUR NGC API KEY>"
Get the credentials to download the models from Hive and export them as well.
export NIM_REPOSITORY_OVERRIDE="s3://..."
export AWS_REGION="..."
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
Fetching the Helm Chart#
You can fetch the helm chart from NGC by executing the following command:
helm fetch https://helm.ngc.nvidia.com/nim/nvidia/charts/multimodal-safety-nim-1.0.0.tgz --username='$oauthtoken' --password=$NGC_API_KEY
Namespace#
You can choose to deploy to whichever namespace is appropriate, but for documentation purposes we will deploy to a namespace named nvidia-nims
.
kubectl create namespace nvidia-nims
Creating Secrets#
Use the following script to create the expected secrets for this helm chart.
DOCKER_CONFIG='{"auths":{"nvcr.io":{"username":"$oauthtoken", "password":"'${NGC_API_KEY}'" }}}'
echo -n $DOCKER_CONFIG | base64 -w0
NGC_REGISTRY_PASSWORD=$(echo -n $DOCKER_CONFIG | base64 -w0 )
cat <<EOF > imagepull.yaml
apiVersion: v1
kind: Secret
metadata:
name: ngc-secret
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: ${NGC_REGISTRY_PASSWORD}
EOF
kubectl apply -n nvidia-nims -f imagepull.yaml
kubectl create -n nvidia-nims secret generic nim-secrets \
--from-literal=NIM_REPOSITORY_OVERRIDE=${NIM_REPOSITORY_OVERRIDE} \
--from-literal=AWS_REGION=${AWS_REGION} \
--from-literal=AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
--from-literal=AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
Configuration Considerations#
The following deployment commands will by default create a single deployment with one replica using the ai-generated-image-detection model. The following options can be used to make modifications to the behavior. See Parameters for a description of the Helm parameters.
image.repository
– The container (ai-generated-image-detection) to deployimage.tag
– The version of that container (1.0.0)Storage options, based on the environment and cluster in use
resources
– Use this option when a model requires more than the default of one GPU. See below for support matrix and resource requirements.env
– Which is an array of environment variables presented to the container, if advanced configuration is needed
Storage#
This NIM uses persistent storage for storing downloaded models. These instructions require that you have a local-nfs
storage class provisioner installed in your cluster.
helm repo add nfs-ganesha-server-and-external-provisioner https://kubernetes-sigs.github.io/nfs-ganesha-server-and-external-provisioner/
helm install nfs-server nfs-ganesha-server-and-external-provisioner/nfs-server-provisioner --set storageClass.name=local-nfs
Advanced Storage Configuration#
Storage is a particular concern when setting up NIMs. Models can be quite large, and you can fill a disk downloading things to emptyDirs or other locations around your pod image. We recommend that you mount persistent storage of some kind on your pod.
This chart supports two general categories:
Persistent Volume Claims (enabled with
persistence.enabled
)hostPath (enabled with
persistences.hostPath
)
By default, the chart uses the standard
storage class and creates a PersistentVolume
and a PersistentVolumeClaim
.
If you do not have a Storage Class Provisioner
that creates PersistentVolume
s automatically, set the value persistence.createPV=true
. This is also necessary when you use persistence.hostPath
on minikube.
If you have an existing PersistentVolumeClaim
where you’d like the models to be stored at, pass that value in at persistence.exsitingClaimName
.
See the Helm options in Parameters.
Deploying#
Create the file custom-values.yaml with the following entries based on the model you want to deploy. The sample values will work in most clusters.
image:
repository: nvcr.io/nim/hive/ai-generated-image-detection # container location
tag: 1.0.0 # NIM version you want to deploy
imagePullSecrets:
- name: ngc-secret # name of a secret used to pull nvcr.io images
image:
repository: nvcr.io/nim/hive/deepfake-image-detection # container location
tag: 1.0.0 # NIM version you want to deploy
imagePullSecrets:
- name: ngc-secret # name of a secret used to pull nvcr.io images
For GPU-specific optimized model profiles, refer to Models.
Use the following helm
command to create a basic deployment.
helm upgrade --install \
--namespace nvidia-nims \
-f custom-values.yaml \
multimodal-safety-nim \
--set persistence.class="local-nfs" \
multimodal-safety-nim-1.0.0.tgz
After deploying check the pods to ensure that it is running, initial image pull and model download can take upwards of 15 minutes.
kubectl get pods -n nvidia-nims
Wait until the pod is in the running state.
NAME READY STATUS RESTARTS AGE
multimodal-safety-nim-0 1/1 Running 0 8m44s
Check events for failures:
kubectl get events -n nvidia-nims
Recommended Configuration for Minikube#
Minikube will create a hostPath based PV and PVC by default with this chart. We recommend that you add the following to your helm commands.
--set persistence.class=standard
Running Inference#
In the previous example the API endpoint is exposed on port 8003 through the Kubernetes service of the default type with no ingress, since authentication is not handled by the NIM itself. The following commands assume the ai-generated-image-detection
model was deployed.
Use the following command to port-forward the service to your local machine to perform inference.
kubectl port-forward -n nvidia-nims service/multimodal-safety-nim 8003:8003
Then try a request:
invoke_url="http://localhost:8003/v1/infer" input_image_path="input.jpg" # download an example image curl https://assets.ngc.nvidia.com/products/api-catalog/sdxl/sdxl1.jpg > $input_image_path image_b64=$(base64 $input_image_path) length=${#image_b64} echo '{ "input": ["data:image/png;base64,'${image_b64}'"] }' > payload.json curl $invoke_url \ -H "Content-Type: application/json" \ -d @payload.jsoninvoke_url="http://localhost:8003/v1/infer" input_image_path="input.jpg" # download an example image curl https://assets.ngc.nvidia.com/products/api-catalog/deepfake-image-detection/input/deepfake.jpg > $input_image_path image_b64=$(base64 $input_image_path) length=${#image_b64} echo '{ "input": ["data:image/png;base64,'${image_b64}'"] }' > payload.json curl $invoke_url \ -H "Content-Type: application/json" \ -d @payload.json
Viewing Log Messages#
Use the following command to view the container log messages in the docker logs.
kubectl logs -f multimodal-safety-nim-0