Text Embedding (Latest)
Microservices

Deploying on Kubernetes

The Helm chart simplifies Text Embedding NIM deployment on Kubernetes. It aims to support deployment with a variety of possible cluster, GPU and storage confurations. The Helm chart downloads the model and spin up the service to begin running.

NIMs are intended to be run on a system with NVIDIA GPUs, with the type and number of GPUs depending on the model. To use helm, you must have a Kubernetes cluster with appropriate GPU nodes and the GPU Operator installed.

Using a helm chart:

  • Enables using Kubernetes Nodes and horizontally scaling the service

  • Encapsulates the complexity of running Docker commands directly

  • Enables monitoring metrics from the NIM

If you haven’t set up your NGC API key and do not know exactly which NIM you want to download and deploy, see the information in the User Guide.

This helm chart requires that you have a secret with your NGC API key configured for downloading private images, and one with your NGC API key (below named ngc-api). These will likely have the same key in it, but they will have different formats (dockerconfig.json vs opaque). See Creating Secrets below.

These instructions will assume that you have your NGC_API_KEY exported in the environment.

Copy
Copied!
            

export NGC_API_KEY="<YOUR NGC API KEY>"

In the event that model assets must be pre-fetched (e.g. in an air-gapped system), the NIM container supports downloading these assets to the NIM cache without starting the server.

Copy
Copied!
            

# Choose a container name for bookkeeping export NIM_MODEL_NAME=nvidia/nv-embedqa-e5-v5 export CONTAINER_NAME=$(basename $NIM_MODEL_NAME) # Choose a NIM Image from NGC export IMG_NAME="nvcr.io/nim/$NIM_MODEL_NAME:1.0.0" # Choose a path on your system to cache the downloaded models export LOCAL_NIM_CACHE=~/.cache/nim mkdir -p "$LOCAL_NIM_CACHE" # Start the NIM docker run -it --rm --name=$CONTAINER_NAME \ --runtime=nvidia \ --gpus all \ --shm-size=16GB \ -e NGC_API_KEY \ -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \ -u $(id -u) \ -p 8000:8000 \ $IMG_NAME download-to-cache

You can fetch the helm chart from NGC by executing the following command:

Copy
Copied!
            

helm fetch https://helm.ngc.nvidia.com/nim/nvidia/charts/text-embedding-nim-1.0.0.tgz --username='$oauthtoken' --password=$NGC_API_KEY

You can use OpenTelemetry for monitoring your container. See OpenTelemetry parameters for details.

You can choose to deploy to whichever namespace is appropriate, but for documentation purposes we will deploy to a namespace named embedding-nim.

Copy
Copied!
            

kubectl create namespace embedding-nim

Use the following script to create the expected secrets for this helm chart.

Copy
Copied!
            

DOCKER_CONFIG='{"auths":{"nvcr.io":{"username":"$oauthtoken", "password":"'${NGC_API_KEY}'" }}}' echo -n $DOCKER_CONFIG | base64 -w0 NGC_REGISTRY_PASSWORD=$(echo -n $DOCKER_CONFIG | base64 -w0 ) cat <<EOF > imagepull.yaml apiVersion: v1 kind: Secret metadata: name: nvcrimagepullsecret type: kubernetes.io/dockerconfigjson data: .dockerconfigjson: ${NGC_REGISTRY_PASSWORD} EOF kubectl apply -n embedding-nim -f imagepull.yaml kubectl create -n embedding-nim secret generic ngc-api-2 --from-literal=NGC_CLI_API_KEY=${NGC_API_KEY}

The following deployment commands will by default create a single deployment with one replica using the NV-EmbedQA-E5-V5 model. The following options can be used to make modifications to the behavior. See Parameters for a description of the Helm parameters.

  • image.repository – The container (Text Embedding NIM) to deploy

  • image.tag – The version of that container (Text Embedding NIM)

  • Storage options, based on the environment and cluster in use

  • resources – Use this option when a model requires more than the default of one GPU. See below for support matrix and resource requirements.

  • env – Which is an array of environment variables presented to the container, if advanced configuration is needed

This NIM uses persistent storage for storing downloaded models. These instructions require that you have a local-nfs storage class provisioner installed in your cluster.

Copy
Copied!
            

helm repo add nfs-ganesha-server-and-external-provisioner https://kubernetes-sigs.github.io/nfs-ganesha-server-and-external-provisioner/ helm install nfs-server nfs-ganesha-server-and-external-provisioner/nfs-server-provisioner --set storageClass.name=local-nfs

Advanced Storage Configuration

Storage is a particular concern when setting up NIMs. Models can be quite large, and you can fill a disk downloading things to emptyDirs or other locations around your pod image. We recommend that you mount persistent storage of some kind on your pod.

This chart supports two general categories:

  1. Persistent Volume Claims (enabled with persistence.enabled)

  2. hostPath (enabled with persistences.hostPath)

By default, the chart uses the standard storage class and creates a PersistentVolume and a PersistentVolumeClaim.

If you do not have a Storage Class Provisioner that creates PersistentVolumes automatically, set the value persistence.createPV=true. This is also necessary when you use persistence.hostPath on minikube.

If you have an existing PersistentVolumeClaim where you’d like the models to be stored at, pass that value in at persistence.exsitingClaimName.

See the Helm options in Parameters.

Use the following bash command to create a basic deploymnet.

Copy
Copied!
            

helm upgrade --install \ --namespace embedding-nim \ nemo-embedder \ --set persistence.class="local-nfs" \ text-embedding-nim-1.0.0.tgz

You can also change the version of the model in use by adding the following after --namespace

Copy
Copied!
            

--set image.tag=1.0.0 \

After deploying check the pods to ensure that it is running, initial image pull and model download can take upwards of 15 minutes.

Copy
Copied!
            

kubectl get pods -n embedding-nim

The pod should eventually end up in the running state.

Copy
Copied!
            

NAME READY STATUS RESTARTS AGE text-embedding-nim-0 1/1 Running 0 8m44s

Check events for failures:

Copy
Copied!
            

kubectl get events --field-selector involvedObject.name=text-embedding-nim -n embedding-nim

Minikube will create a hostPath based PV and PVC by default with this chart. We recommend that you add the following to your helm commands.

Copy
Copied!
            

--set persistence.class=standard

Run the helm command with the following parameters, update your version in image.tag:

Copy
Copied!
            

helm upgrade --install \ --namespace embedding-nim \ --set image.repository=nvcr.io/nim/snowflake/arctic-embed-l \ --set image.tag=1.0.0 \ --set persistence.class="local-nfs" \ nemo-embedder \ text-embedding-nim-1.0.0.tgz

Create a values files for the resource requirements of the 7B Model:

Copy
Copied!
            

# values-mistral.yaml resources: limits: ephemeral-storage: 28Gi nvidia.com/gpu: 1 memory: 32Gi cpu: "16000m" requests: ephemeral-storage: 28Gi nvidia.com/gpu: 1 memory: 16Gi cpu: "4000m"

Then deploy the model:

Copy
Copied!
            

helm upgrade --install \ --namespace embedding-nim \ -f values-mistral.yaml \ --set image.repository=nvcr.io/nim/nvidia/nv-embedqa-mistral-7b-v2 \ --set image.tag=1.0.0 \ --set persistence.class="local-nfs" \ nemo-embedder \ text-embedding-nim-1.0.0.tgz

In the previous example the API endpoint is exposed on port 8080 through the Kubernetes service of the default type with no ingress, since authentication is not handled by the NIM itself. The following commands assume the NV-EmbedQA-E5-V5 model was deployed.

Adjust the “model” value in the request JSON body to use a different model.

Use the following command to port-forward the service to your local machine to test inference.

Copy
Copied!
            

kubectl port-forward -n embedding-nim service/text-embedding-nim 8080:8080

Then try a request:

Copy
Copied!
            

curl -X 'POST' \ 'http://localhost:8080/v1/embeddings' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "input": "hello world", "model": "nvidia/nv-embedqa-e5-v5", "input_type": "passage" }'

Use the following command to view the container log messages in the docker logs.

Copy
Copied!
            

docker logs $CONTAINER_NAME -f

Previous Getting Started
Next Configuration
© Copyright © 2024, NVIDIA Corporation. Last updated on Jul 23, 2024.