Oracle#
This guide covers deploying NIM LLM on Oracle Cloud Infrastructure (OCI) using OKE (Oracle Kubernetes Engine), a managed Kubernetes service for running containerized workloads on OCI.
OKE Deployment#
Create an OKE cluster with GPU-capable worker nodes and prepare the OCI environment for NIM LLM workloads. This section covers prerequisites, cluster creation, and initial setup.
Prerequisites#
Install the following tools before proceeding:
You also need an NGC API key with access to NIM LLM container images and Helm charts.
Note
Your OCI tenancy must have GPU quota available in the target region. Check Availability by Region and ensure the compartment has a configured VCN, subnets, internet gateway, route table, and security lists.
Tip
If you use a non-default OCI CLI profile, pass --profile ${PROFILE_NAME} or set --auth instance_principal on all oci commands. Alternatively, export OCI_CLI_PROFILE=${PROFILE_NAME}.
Create an OKE Cluster through OCI#
Set environment variables used throughout this guide:
export CLUSTER_OCID="${YOUR_CLUSTER_OCID}"
export NAMESPACE="nim-llm"
export NIM_LLM_CHART_VERSION="${CHART_VERSION}"
export OCI_REGION="${YOUR_OCI_REGION}"
export RELEASE_NAME="my-nim"
Create an OKE cluster with GPU-capable worker nodes. Refer to the following Oracle documentation for detailed instructions:
Creating a Cluster for step-by-step cluster creation using the OCI Console or CLI.
Running Applications on GPU-based Nodes for configuring GPU node pools with the appropriate shapes and images.
Note
Set the boot volume size to at least 200 GB during node pool creation to avoid disk pressure when pulling large NIM container images. For production workloads, use an Enhanced cluster (includes a financially-backed SLA).
After the cluster is created, configure kubectl access by following the Setting Up Cluster Access guide. Verify connectivity by running:
kubectl get nodes
Expand the Boot Volume#
OKE worker nodes do not automatically use the full allocated boot volume, which can cause disk pressure when pulling large NIM container images. To expand the root filesystem, refer to Oracle’s documentation:
Extending the Root Partition of Worker Nodes for manual and automated approaches to expanding the root partition.
Using Custom Cloud-init Initialization Scripts for automating the expansion at node provisioning time.
Install the NVIDIA GPU Operator#
The NVIDIA GPU Operator manages GPU drivers and device plugins on OKE nodes.
Note
The GPU Operator supports Ubuntu and RHEL/CoreOS-based node images. Verify your OKE node pool OS image is compatible before installing.
helm repo add nvidia https://nvidia.github.io/gpu-operator
helm repo update
helm install gpu-operator nvidia/gpu-operator \
--namespace gpu-operator \
--create-namespace \
--set driver.enabled=false
Note
Set driver.enabled=false when GPU drivers are already managed by OCI (pre-installed on bare metal shapes). For VM shapes without pre-installed drivers, omit this flag or set it to true.
Verify the GPU Operator is running:
export NODE_NAME=$(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')
kubectl get pods -n gpu-operator
kubectl describe node ${NODE_NAME} | grep -i nvidia
For more information, refer to the NVIDIA GPU Operator documentation.
Create Kubernetes Secrets#
export IMAGE_PULL_SECRET="ngc-secret"
export NGC_API_KEY="${YOUR_NGC_API_KEY}"
kubectl create namespace $NAMESPACE
kubectl create secret docker-registry $IMAGE_PULL_SECRET \
--namespace $NAMESPACE \
--docker-server=nvcr.io \
--docker-username='$oauthtoken' \
--docker-password="$NGC_API_KEY"
kubectl create secret generic ngc-api \
--namespace $NAMESPACE \
--from-literal=NGC_API_KEY="$NGC_API_KEY"
For gated Hugging Face models, create an additional secret:
export HF_TOKEN="${YOUR_HF_TOKEN}"
kubectl create secret generic hf-token \
--namespace $NAMESPACE \
--from-literal=HF_TOKEN="$HF_TOKEN"
Create a Persistent Volume Claim#
NIM requires persistent storage to cache downloaded model files across pod restarts. Create a PVC before you deploy:
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nvidia-nim-cache-pvc
namespace: ${NAMESPACE}
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
EOF
Tip
OCI Block Volume is used as the default storage class on OKE. Adjust the storage size to fit your model. For more information, refer to Provisioning PVCs on the Block Volume Service. For multi-node deployments that require ReadWriteMany access, refer to Provisioning PVCs on the File Storage Service.
Deploy NIM LLM with Helm#
Pull the Helm chart from NGC:
helm pull https://helm.ngc.nvidia.com/nim/charts/nim-llm-${NIM_LLM_CHART_VERSION}.tgz \ --username='$oauthtoken' --password=$NGC_API_KEY
Optional: View the default chart values to understand available configuration options:
helm show values nim-llm-${NIM_LLM_CHART_VERSION}.tgz
Tip
For help choosing the right model configuration for your
values.yaml, refer to Model Profiles and Selection.Deploy using a custom values file:
helm upgrade --install $RELEASE_NAME nim-llm-${NIM_LLM_CHART_VERSION}.tgz \ --namespace $NAMESPACE \ -f path/to/your/custom-values.yaml
Example
custom-values.yamlfor OKE:image: repository: <NIM_LLM_MODEL_SPECIFIC_IMAGE> tag: "2.0.1" pullPolicy: IfNotPresent model: name: meta/llama-3.1-8b-instruct ngcAPISecret: ngc-api nimCache: /model-store openaiPort: 8000 logLevel: INFO # Running as root (uid 0) is required on OKE to write to OCI Block Volume # mounts, which are owned by root by default. Adjust if your storage class # supports a non-root fsGroup. podSecurityContext: runAsUser: 0 runAsGroup: 0 fsGroup: 0 persistence: enabled: true existingClaim: "nvidia-nim-cache-pvc" resources: limits: nvidia.com/gpu: 1 imagePullSecrets: - name: ngc-secret nodeSelector: nvidia.com/gpu.present: "true" service: type: LoadBalancer openaiPort: 8000
Verify the Deployment#
Get the service endpoint:
kubectl -n $NAMESPACE get svc -l app.kubernetes.io/name=nim-llm
Check the health endpoint (set
NIM_EXTERNAL_IPfrom the serviceEXTERNAL-IP):export NIM_EXTERNAL_IP=$(kubectl -n $NAMESPACE get svc -l app.kubernetes.io/name=nim-llm \ -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}') curl -s "http://${NIM_EXTERNAL_IP}:8000/v1/health/ready"
Send an inference request:
curl -X POST "http://${NIM_EXTERNAL_IP}:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{ "model": "meta/llama-3.1-8b-instruct", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 128 }'
Teardown#
To remove all resources created by this guide:
# Remove the Helm release
helm uninstall $RELEASE_NAME --namespace $NAMESPACE
# Delete Kubernetes secrets and PVC
kubectl delete secret ngc-api $IMAGE_PULL_SECRET --namespace $NAMESPACE
kubectl delete pvc nvidia-nim-cache-pvc --namespace $NAMESPACE
# Delete the namespace
kubectl delete namespace $NAMESPACE
# Delete the OKE cluster (also removes managed node pools)
oci ce cluster delete --cluster-id ${CLUSTER_OCID} --region ${OCI_REGION} --force
Note
Deleting the cluster does not automatically remove associated OCI Block Volumes or Load Balancers. Delete these manually from the OCI Console to avoid ongoing charges.