Deploying on Kubernetes with Helm Chart

You can deploy PaddleOCR NIM with a Helm chart. The Helm chart simplifies PaddleOCR NIM deployment on Kubernetes. It supports deployment with optional cluster, GPU, and storage configurations.

The Helm chart downloads the model and starts the service to begin running inferences.

NIMs are designed to be run on a system with NVIDIA GPUs, with the type and number of GPUs depending on the model. To use the Helm chart, you must have a Kubernetes cluster with appropriate GPU nodes and GPU Operator installed.

Benefits of Helm Chart Deployment

Using a Helm chart to deploy on Kubernetes has the following benefits compared to manual deployment:

  • Enables using Kubernetes Nodes and horizontally scaling the service

  • Encapsulates the complexity of running Docker commands directly

  • Enables monitoring metrics from the NIM

Setting Up the Environment

If you haven’t set up your NGC API key and do not know exactly which NIM you want to download and deploy, refer to the User Guide.

The Helm chart requires that you have a secret with your NGC API key configured for downloading private images, and one with your NGC API key, which is named ngc-api in the following sections. The secrets should have the same key, but have different formats (dockerconfig.json vs opaque). Refer to the following Creating Secrets section for details.

These instructions require that you have exported your NGC_API_KEY to the environment. Use the following command to export your key.

export NGC_API_KEY="<YOUR NGC API KEY>"

Fetching the Helm Chart

You can download the Helm chart from NGC by executing the following command:

helm fetch https://helm.ngc.nvidia.com/ohlfw0olaadg/ea-participants/charts/paddleocr-nim-0.2.0.tgz --username='$oauthtoken' --password=$NGC_API_KEY

Namespace

You can choose to deploy to whichever namespace is appropriate, but this document uses the namespace paddleocr-nim. Use the following command to create that namespace.

kubectl create namespace paddleocr-nim

Creating Secrets

Use the following script to create the required secrets for the Helm chart.

DOCKER_CONFIG='{"auths":{"nvcr.io":{"username":"$oauthtoken", "password":"'${NGC_API_KEY}'" }}}'

# [Linux] Encode nvcr registry config as base64
NGC_REGISTRY_PASSWORD=$(echo -n $DOCKER_CONFIG | base64 -w0)

# [MacOS] Encode nvcr registry config as base64
NGC_REGISTRY_PASSWORD=$(echo -n $DOCKER_CONFIG | base64 -b0)

# Create image pull secret
cat <<EOF > imagepull.yaml
apiVersion: v1
kind: Secret
metadata:
  name: nvcrimagepullsecret
type: kubernetes.io/dockerconfigjson
data:
  .dockerconfigjson: ${NGC_REGISTRY_PASSWORD}
EOF

kubectl apply -n paddleocr-nim -f imagepull.yaml
kubectl create -n paddleocr-nim secret generic ngc-api \
  --from-literal=NGC_API_KEY=${NGC_API_KEY} \
  --from-literal=NGC_CLI_API_KEY=${NGC_API_KEY}

Configuration Considerations

By default, the following deployment commands create a single deployment with one replica using the paddleocr model. Use the following options to modify how the model behaves. Refer to Parameters for information about parameters.

  • image.repository – The container (PaddleOCR NIM) to deploy

  • image.tag – The version of that container (PaddleOCR NIM)

  • Storage options, based on the environment and cluster in use

  • resources – Use this option when a model requires more than the default of one GPU. Refer to the support matrix and resource requirements.

  • env – An array of environment variables presented to the container, if advanced configuration is needed

Storage

This NIM uses persistent storage for storing downloaded models, and sample commands in this guide require the local-nfs storage class. Use the following commands to install the local-nfs storage class and provisioner in your Kubernetes cluster.

helm repo add nfs-ganesha-server-and-external-provisioner https://kubernetes-sigs.github.io/nfs-ganesha-server-and-external-provisioner/
helm install nfs-server nfs-ganesha-server-and-external-provisioner/nfs-server-provisioner --set storageClass.name=local-nfs

Advanced Storage Configuration

Storage is a particular concern when setting up NIMs. Models can be quite large, and you can fill a disk downloading models to emptyDir volumes. We recommend that you mount persistent storage of some kind on your pod.

This chart supports two general categories:

  • Persistent Volume Claims (enabled with persistence.enabled)

  • hostPath (enabled with persistences.hostPath)

By default, the chart uses the standard storage class and creates a PersistentVolume and a PersistentVolumeClaim.

If you do not have a Storage Class Provisioner that creates PersistentVolumes automatically, set the value persistence.createPV=true. This is also necessary when you use persistence.hostPath on minikube.

If you have an existing PersistentVolumeClaim where you’d like the models to be stored at, pass that value in at persistence.exsitingClaimName.

Refer to the Helm options in Parameters.

Deploying

Basic deployment

helm upgrade --install \
  --namespace paddleocr-nim \
  paddleocr-nim \
  --set persistence.class="local-nfs" \
  paddleocr-nim-0.2.0.tgz

You can also change the version of the paddleocr model in use by adding the following after --namespace

--set image.tag=0.2.0 \

After deploying, use the following command to check whether the pod is running, as the initial image pull and model download can take upwards of 15 minutes.

kubectl get pods -n paddleocr-nim

This command should eventually return something similar to the following when the pod is running.

NAME              READY   STATUS    RESTARTS   AGE
paddleocr-nim-0   1/1     Running   0          8m44s

You can use the following command to check events for failures:

kubectl get events -n paddleocr-nim --sort-by='.lastTimestamp'

Running Inference

In the previous example the API endpoint is exposed on port 8000 through the Kubernetes service of the default type with no ingress, since authentication is not handled by the NIM itself. The following commands require that the nvidia/paddleocr model has been deployed.

If required, change the “model” value in the request JSON body to use a different model.

Use the following command to port-forward the service to your local machine to test inference.

kubectl port-forward -n paddleocr-nim service/paddleocr-nim 8000:8000

Create a directory data/structured-imgs and copy in some .png formatted images so that data looks like so:

$ mkdir -p data/structured-imgs

$ ls -l data/structured-imgs
sample1.png
sample2.png
sample3.png
sample4.png

Send an inference request by running the following commands and Python 3.11 script.

# Create a virtual env (venv) for this test to isolate the dependencies
python3 -m venv paddleocr_venv
source paddleocr_venv/bin/activate

# Install pillow and requests libraries into your python 3 environment
pip3 install requests pillow
# paddleocr_inference_test.py
import base64
import json
import time
from io import BytesIO
from pathlib import Path

import requests
from PIL import Image

images = []
image_paths = list(Path("data/structured-imgs").glob("*"))
for image_path in image_paths:
    image = Image.open(image_path)
    buffered = BytesIO()
    image.save(buffered, format="PNG")
    base64_image = base64.b64encode(buffered.getvalue()).decode("utf-8")
    image_url = f"data:image/png;base64,{base64_image}"
    image = {"type": "image_url", "image_url": {"url": image_url}}
    images.append(image)

message = {"content": images}
payload = {"messages": [message]}

start = time.time()
print(json.dumps(requests.post("http://localhost:8000/v1/infer", json=payload).json()))
print(f"{len(image_paths)} images completed in {time.time() - start} seconds")

# Run inference on the .png files in the /data directory
python3 paddleocr_inference_test.py

Logging

Use the following command to view the container log messages in the docker logs.

kubectl logs --selector=app.kubernetes.io/name=paddleocr-nim -n paddleocr-nim