Oracle#

This guide covers deploying NIM LLM on Oracle Cloud Infrastructure (OCI) using OKE (Oracle Kubernetes Engine), a managed Kubernetes service for running containerized workloads on OCI.

OKE Deployment#

Create an OKE cluster with GPU-capable worker nodes and prepare the OCI environment for NIM LLM workloads. This section covers prerequisites, cluster creation, and initial setup.

Prerequisites#

Install the following tools before proceeding:

You also need an NGC API key with access to NIM LLM container images and Helm charts.

Note

Your OCI tenancy must have GPU quota available in the target region. Check Availability by Region and ensure the compartment has a configured VCN, subnets, internet gateway, route table, and security lists.

Tip

If you use a non-default OCI CLI profile, pass --profile ${PROFILE_NAME} or set --auth instance_principal on all oci commands. Alternatively, export OCI_CLI_PROFILE=${PROFILE_NAME}.

Create an OKE Cluster through OCI#

Set environment variables used throughout this guide:

export CLUSTER_OCID="${YOUR_CLUSTER_OCID}"
export NAMESPACE="nim-llm"
export NIM_LLM_CHART_VERSION="${CHART_VERSION}"
export OCI_REGION="${YOUR_OCI_REGION}"
export RELEASE_NAME="my-nim"

Create an OKE cluster with GPU-capable worker nodes. Refer to the following Oracle documentation for detailed instructions:

Note

Set the boot volume size to at least 200 GB during node pool creation to avoid disk pressure when pulling large NIM container images. For production workloads, use an Enhanced cluster (includes a financially-backed SLA).

After the cluster is created, configure kubectl access by following the Setting Up Cluster Access guide. Verify connectivity by running:

kubectl get nodes

Expand the Boot Volume#

OKE worker nodes do not automatically use the full allocated boot volume, which can cause disk pressure when pulling large NIM container images. To expand the root filesystem, refer to Oracle’s documentation:

Install the NVIDIA GPU Operator#

The NVIDIA GPU Operator manages GPU drivers and device plugins on OKE nodes.

Note

The GPU Operator supports Ubuntu and RHEL/CoreOS-based node images. Verify your OKE node pool OS image is compatible before installing.

helm repo add nvidia https://nvidia.github.io/gpu-operator
helm repo update
helm install gpu-operator nvidia/gpu-operator \
  --namespace gpu-operator \
  --create-namespace \
  --set driver.enabled=false

Note

Set driver.enabled=false when GPU drivers are already managed by OCI (pre-installed on bare metal shapes). For VM shapes without pre-installed drivers, omit this flag or set it to true.

Verify the GPU Operator is running:

export NODE_NAME=$(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')
kubectl get pods -n gpu-operator
kubectl describe node ${NODE_NAME} | grep -i nvidia

For more information, refer to the NVIDIA GPU Operator documentation.

Create Kubernetes Secrets#

export IMAGE_PULL_SECRET="ngc-secret"
export NGC_API_KEY="${YOUR_NGC_API_KEY}"

kubectl create namespace $NAMESPACE

kubectl create secret docker-registry $IMAGE_PULL_SECRET \
  --namespace $NAMESPACE \
  --docker-server=nvcr.io \
  --docker-username='$oauthtoken' \
  --docker-password="$NGC_API_KEY"

kubectl create secret generic ngc-api \
  --namespace $NAMESPACE \
  --from-literal=NGC_API_KEY="$NGC_API_KEY"

For gated Hugging Face models, create an additional secret:

export HF_TOKEN="${YOUR_HF_TOKEN}"

kubectl create secret generic hf-token \
  --namespace $NAMESPACE \
  --from-literal=HF_TOKEN="$HF_TOKEN"

Create a Persistent Volume Claim#

NIM requires persistent storage to cache downloaded model files across pod restarts. Create a PVC before you deploy:

kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nvidia-nim-cache-pvc
  namespace: ${NAMESPACE}
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
EOF

Tip

OCI Block Volume is used as the default storage class on OKE. Adjust the storage size to fit your model. For more information, refer to Provisioning PVCs on the Block Volume Service. For multi-node deployments that require ReadWriteMany access, refer to Provisioning PVCs on the File Storage Service.

Deploy NIM LLM with Helm#

  1. Pull the Helm chart from NGC:

    helm pull https://helm.ngc.nvidia.com/nim/charts/nim-llm-${NIM_LLM_CHART_VERSION}.tgz \
      --username='$oauthtoken' --password=$NGC_API_KEY
    
  2. Optional: View the default chart values to understand available configuration options:

    helm show values nim-llm-${NIM_LLM_CHART_VERSION}.tgz
    

    Tip

    For help choosing the right model configuration for your values.yaml, refer to Model Profiles and Selection.

  3. Deploy using a custom values file:

    helm upgrade --install $RELEASE_NAME nim-llm-${NIM_LLM_CHART_VERSION}.tgz \
      --namespace $NAMESPACE \
      -f path/to/your/custom-values.yaml
    

    Example custom-values.yaml for OKE:

    image:
      repository: <NIM_LLM_MODEL_SPECIFIC_IMAGE>
      tag: "2.0.1"
      pullPolicy: IfNotPresent
    
    model:
      name: meta/llama-3.1-8b-instruct
      ngcAPISecret: ngc-api
      nimCache: /model-store
      openaiPort: 8000
      logLevel: INFO
    
    # Running as root (uid 0) is required on OKE to write to OCI Block Volume
    # mounts, which are owned by root by default. Adjust if your storage class
    # supports a non-root fsGroup.
    podSecurityContext:
      runAsUser: 0
      runAsGroup: 0
      fsGroup: 0
    
    persistence:
      enabled: true
      existingClaim: "nvidia-nim-cache-pvc"
    
    resources:
      limits:
        nvidia.com/gpu: 1
    
    imagePullSecrets:
      - name: ngc-secret
    
    nodeSelector:
      nvidia.com/gpu.present: "true"
    
    service:
      type: LoadBalancer
      openaiPort: 8000
    

Verify the Deployment#

  1. Get the service endpoint:

    kubectl -n $NAMESPACE get svc -l app.kubernetes.io/name=nim-llm
    
  2. Check the health endpoint (set NIM_EXTERNAL_IP from the service EXTERNAL-IP):

    export NIM_EXTERNAL_IP=$(kubectl -n $NAMESPACE get svc -l app.kubernetes.io/name=nim-llm \
      -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}')
    curl -s "http://${NIM_EXTERNAL_IP}:8000/v1/health/ready"
    
  3. Send an inference request:

    curl -X POST "http://${NIM_EXTERNAL_IP}:8000/v1/chat/completions" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "meta/llama-3.1-8b-instruct",
        "messages": [{"role": "user", "content": "Hello!"}],
        "max_tokens": 128
      }'
    

Teardown#

To remove all resources created by this guide:

# Remove the Helm release
helm uninstall $RELEASE_NAME --namespace $NAMESPACE

# Delete Kubernetes secrets and PVC
kubectl delete secret ngc-api $IMAGE_PULL_SECRET --namespace $NAMESPACE
kubectl delete pvc nvidia-nim-cache-pvc --namespace $NAMESPACE

# Delete the namespace
kubectl delete namespace $NAMESPACE

# Delete the OKE cluster (also removes managed node pools)
oci ce cluster delete --cluster-id ${CLUSTER_OCID} --region ${OCI_REGION} --force

Note

Deleting the cluster does not automatically remove associated OCI Block Volumes or Load Balancers. Delete these manually from the OCI Console to avoid ongoing charges.

References#