Dynamic Resource Allocation (DRA) Support for NIM#
About Dynamic Resource Allocation (DRA)#
DRA is a built-in Kubernetes feature that simplifies GPU allocation for NIMService, NIMPipelines, and NIMBuild. It replaces traditional device plugins with a more flexible and unified approach. It enables users to define GPU classes, request GPUs based on those classes, and filter them according to specific workload and business requirements. This extensibility makes DRA a powerful tool for managing GPU resources efficiently across diverse NIM use cases.
NIM Operator fully supports the NVIDIA GPU DRA driver and enables you to dynamically allocate GPU resources using a ResourceClaim or ResourceClaimTemplate.
NIM Operator supports DRA resource claims for two types of GPU usage:
Full GPU
Multi-Instance GPU (MIG)
Additionally, the supported GPU sharing strategies are:
Time-slicing
Note
DRA support for NIM is currently a Technology Preview feature on NIM Operator, which means it is not fully supported and is not suitable for deployment in production.
Example Procedure#
Summary#
To use Dynamic Resource Allocation for a NIM, follow these steps:
Configure NIM Service to use the DRA resource with one of the following options:
Note
The NIM Cache, ResourceClaim or ResourceClaimTemplate, and NIM Service must all be in the same Kubernetes namespace.
The following examples use nim-service
as the namespace.
1. Complete the Prerequisites#
Ensure you are using Kubernetes cluster version 1.33 or later.
For Kubernetes version 1.33.x, enable support for Dynamic Resource Allocation (DRA).
The
DynamicResourceAllocation
feature gate must be enabled on the following components in the Kubernetes cluster:kube-apiserver
kube-controller-manager
kube-scheduler
kubelet
Enable the following additional API groups:
resource.k8s.io/v1beta1
resource.k8s.io/v1beta2
For more information, refer to Enabling or disabling API groups.
Enable the Container Device Interface (CDI) and set it as the default runtime.
Install the NVIDIA Container Toolkit and configure it on all GPU nodes in the Kubernetes cluster.
Set the
accept-nvidia-visible-devices-as-volume-mounts
toolkit configuration to work with DRA on all the nodes:$ sudo nvidia-ctk config --in-place --set accept-nvidia-visible-devices-as-volume-mounts=true
Disable device plugins.
Note
Before disabling the device plugins, ensure there are no GPU workloads running in the cluster.
$ kubectl patch clusterpolicy cluster-policy \ --type=json -p '[{"op":"replace","path":"/spec/devicePlugin/enabled","value":false}]'
$ kubectl get clusterpolicy cluster-policy -o json | jq -r '.spec.devicePlugin.enabled'
Deploy the DRA driver helm chart with GPU DRA driver enabled.
$ helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \ && helm repo update
$ helm install nvidia-dra-driver-gpu nvidia/nvidia-dra-driver-gpu \ --version="25.3.0" \ --create-namespace \ --namespace nvidia-dra-driver-gpu \ --set gpuResourcesEnabledOverride=true \ --set nvidiaDriverRoot=/run/nvidia/driver
If using DRA with MIGs, configure MIG Manager. For more information, refer to GPU Operator with MIG.
Note
The NVIDIA DRA driver deploys a default device class gpu.nvidia.com
for GPU resources.
Use the following command to view the device classes:
$ kubectl get deviceclass
NAME AGE
gpu.nvidia.com 78d
The gpu.nvidia.com
DeviceClass represents physical NVIDIA GPUs managed by the NVIDIA GPU DRA driver.
You can use this DeviceClass to create ResourceClaim objects to request specific GPUs.
2. Decide on GPU Usage: Full GPU or MIG#
You must decide whether to use Full GPU or Multi-Instance GPU (MIG) for the GPU usage. Generally, Full GPU is used for LLMs, whereas time-slicing or MIG can be used for non-LLMs or smaller NIMs. Using MIG is a more advanced configuration.
If using MIG for the GPU usage, make sure you have enabled MIG and MIG Slicing. Refer to GPU Operator with MIG for more detailed instructions.
The GPU sharing strategy currently supported is time slicing.
GPU time-slicing enables workloads that are scheduled on oversubscribed GPUs to interleave with one another. For more information, refer to Time-Slicing GPUs in Kubernetes
3. Create a NIM Cache Custom Resource#
Note
Refer to Prerequisites for more information on using NIM Cache.
Create a NIM Cache manifest, such as
nimcache.yaml
, with contents like the following sample manifest:apiVersion: apps.nvidia.com/v1alpha1 kind: NIMCache metadata: name: meta-llama3-2-1b-instruct namespace: nim-service spec: source: ngc: modelPuller: nvcr.io/nim/meta/llama-3.2-1b-instruct:1.12.0 pullSecret: ngc-secret authSecret: ngc-api-secret model: engine: tensorrt_llm tensorParallelism: "1" storage: pvc: create: true storageClass: "" size: "50Gi" volumeAccessMode: ReadWriteOnce
Apply the manifest:
$ kubectl apply -n nim-service -f nimcache.yaml
4. Configure NIM Service to Use the DRA Resource#
There are two options for how NIM Service can use the DRA resource:
Option A: Auto-generate ResourceClaimTemplate (recommended). NIM Service automatically creates a ResourceClaimTemplate using the fields specified in
spec.draResources.claimCreationSpec
.Option B: Use pre-created ResourceClaim or ResourceClaimTemplate. First create a ResourceClaim or ResourceClaimTemplate, then configure NIM Service to use that ResourceClaim or ResourceClaimTemplate.
Option A: Auto-generate ResourceClaimTemplate#
Note
Auto-generate only supports ResourceClaimTemplate.
The following NIMService spec.draResources.claimCreationSpec
fields can be configured to auto-generate a ResourceClaimTemplate:
Field |
Description |
Default Value |
---|---|---|
|
Defines the criteria that must be satisfied by the device attributes of a device. |
None |
|
Specifies the name of the device attribute. This is either a qualified name or a simple name. If it is a simple name, then it is assumed to be prefixed with the DRA driver name. For example, “gpu.nvidia.com/productName” is equivalent to “productName” if the driver name is “gpu.nvidia.com”. Otherwise they are treated as 2 different attributes. |
None |
|
Specifies the operator to use for comparing the device attribute value. Supported operators are:
|
|
|
Specifies the value to compare against the device attribute. |
None |
|
Defines the criteria that must be satisfied by the device capacity of a device. |
None |
|
Specifies the name of the resource. This is either a qualified name or a simple name. If it is a simple name, then it is assumed to be prefixed with the DRA driver name. For example, “gpu.nvidia.com/memory” is equivalent to “memory” if the driver name is “gpu.nvidia.com”. Otherwise they are treated as 2 different attributes. |
None |
|
Specifies the operator to use for comparing against the device capacity. Supported operators are:
|
|
|
Specifies the resource quantity to compare against. |
None |
|
Specifies a list of CEL expressions that must be satisfied by the DRA device. |
None |
|
Specifies the number of devices to request. |
|
|
Specifies the DeviceClass to inherit configuration and selectors from. |
|
|
Specifies the name of the DRA driver providing the capacity information. Must be a DNS subdomain. |
|
|
Specifies the name of the device request to use in the generated claim spec. Must be a valid DNS_LABEL. |
None |
Create a NIM Service
Create a NIM Service manifest, such as
nimservice.yaml
, with contents like the following sample manifest:apiVersion: apps.nvidia.com/v1alpha1 kind: NIMService metadata: name: meta-llama3-2-1b-instruct namespace: nim-service spec: image: repository: nvcr.io/nim/meta/llama-3.2-1b-instruct tag: "1.12.0" pullPolicy: IfNotPresent pullSecrets: - ngc-secret authSecret: ngc-api-secret storage: nimCache: name: meta-llama3-2-1b-instruct profile: '' replicas: 1 draResources: - claimCreationSpec: devices: - name: gpu deviceClassName: gpu.nvidia.com driverName: gpu.nvidia.com attributeSelectors: - key: architecture op: Equal value: stringValue: Ampere capacitySelectors: - key: memory op: GreaterThanOrEqual value: 40Gi expose: service: type: ClusterIP port: 8000
Apply the manifest:
$ kubectl create -f nimservice.yaml -n nim-service
For more sample manifests, refer to config/samples/nim/serving/advanced/dra/auto-creation/.
Option B: Use Pre-Created ResourceClaim or ResourceClaimTemplate#
You can use a ResourceClaim for sharing GPUs. Alternatively, you can use a ResourceClaimTemplate that creates a ResourceClaim on the fly and assigns it to the pod before the pod starts.
Refer to Allocate Devices to Workloads with DRA in the Kubernetes documentation for more detailed information.
Create a ResourceClaim
Refer to the Kubernetes ResourceClaim spec for more detailed configuration options.
Create a ResourceClaim, with contents like the following
resourceclaim.yaml
sample manifest, for sharing a full GPU:apiVersion: resource.k8s.io/v1beta2 kind: ResourceClaim metadata: name: gpu-claim spec: devices: requests: - exactly: allocationMode: ExactCount count: 1 deviceClassName: gpu.nvidia.com name: gpu
Apply the manifest:
$ kubectl apply -n nim-service -f resourceclaim.yaml
Create a NIM Service
Create a file, such as
nimservice.yaml
, with contents like the sample manifest. Setspec.draResources.resourceClaimName
togpu-claim
, which is the name of ResourceClaim from the precedingresourceclaim.yaml
example.apiVersion: apps.nvidia.com/v1alpha1 kind: NIMService metadata: name: meta-llama-31-8b-instruct spec: image: repository: nvcr.io/nim/meta/llama-3.1-8b-instruct tag: 1.8.3 pullPolicy: IfNotPresent pullSecrets: - ngc-secret authSecret: ngc-api-secret storage: nimCache: name: meta-llama-31-8b-instruct profile: '' replicas: 1 draResources: - resourceClaimName: gpu-claim expose: service: type: ClusterIP port: 8000
Note
If you only want to use a subset of devices from the claim, use the
requests
field. That specifies the subset of requests from the ResourceClaim that are made available to the service.For example:
draResources: - resourceClaimName: gpu-claim requests: - gpu-0
Apply the manifest:
$ kubectl create -f nimservice.yaml -n nim-service
Create a ResourceClaimTemplate
Refer to the Kubernetes ResourceClaimTemplate spec for more detailed configuration options.
Create a file, such as
resourceclaimtemplate.yaml
, with contents like the following sample manifest, for sharing a full GPU:apiVersion: resource.k8s.io/v1beta2 kind: ResourceClaimTemplate metadata: name: gpu-resourceclaimtemplate namespace: nim-service spec: spec: devices: requests: - exactly: allocationMode: ExactCount count: 1 deviceClassName: gpu.nvidia.com name: gpu
Apply the manifest:
$ kubectl apply -n nim-service -f resourceclaimtemplate.yaml
Create a NIM Service
Create a file, such as
nimservice.yaml
, with contents like the following sample manifest. Setspec.draResources.resourceClaimTemplateName
togpu-claim-template
, which is the name of ResourceClaimTemplate from the precedingresourceclaimtemplate.yaml
example.apiVersion: apps.nvidia.com/v1alpha1 kind: NIMService metadata: name: meta-llama3-2-1b-instruct namespace: nim-service spec: image: repository: nvcr.io/nim/meta/llama-3.2-1b-instruct tag: "1.12.0" pullPolicy: IfNotPresent pullSecrets: - ngc-secret authSecret: ngc-api-secret storage: nimCache: name: meta-llama3-2-1b-instruct profile: '' replicas: 1 draResources: - resourceClaimTemplateName: gpu-resourceclaimtemplate expose: service: type: ClusterIP port: 8000
Note
If you only want to use a subset of devices from the claim, use the
requests
field. That specifies the subset of requests from the ResourceClaim that are made available to the service.For example:
draResources: - resourceClaimTemplateName: gpu-claim-template requests: - gpu-0
Apply the manifest:
$ kubectl create -f nimservice.yaml -n nim-service
For more sample manifests, refer to config/samples/nim/serving/advanced/dra/manual/.
Note
You can determine the
count
based on the model card resource requirements.To change the GPU count, you must create and attach a new ResourceClaim or ResourceClaimTemplate, which triggers a rolling update in the NIM Service.
Displaying Statuses#
To display statuses related to DRA resources, use the following examples.
To view the referenced DRA resources:
$ kubectl get nimservices meta-llama-31-8b-instruct -o json | jq .spec.draResources
Example output
[ { "resourceClaimName": "gpu-claim" } ]
To view the statuses of the referenced DRA resources:
$ kubectl get nimservices meta-llama-31-8b-instruct -o json | jq .status.draResourceStatuses
Example output
[ { "name": "claim-846549f86-1-5c79578868-0", "resourceClaimStatus": { "name": "gpu-claim", "resourceClaimStatuses": [ { "name": "meta-llama-31-8b-instruct-67bc4-claim-846549f86-1-5c796zw8m", "state": "allocated,reserved" } ] } } ]
To view resource claims:
$ kubectl get resourceclaims
Example output
NAME STATE AGE meta-llama-31-8b-instruct-67bc4-claim-846549f86-1-5c796zw8m allocated,reserved 91s
To view the referenced DRA resources:
$ kubectl get nimservices meta-llama-31-8b-instruct -o json | jq .spec.draResources
Example output
[ { "resourceClaimTemplateName": "gpu-claim-template" } ]
To view the statuses of the referenced DRA resources:
$ kubectl get nimservices meta-llama-31-8b-instruct -o json | jq .status.draResourceStatuses
Example output
[ { "name": "claim-846549f86-1-5c79578868-0", "resourceClaimTemplateStatus": { "name": "gpu-claim-template", "resourceClaimStatuses": [ { "name": "meta-llama-31-8b-instruct-67bc4-claim-846549f86-1-5c796zw8m", "state": "allocated,reserved" } ] } } ]
To view resource claims:
$ kubectl get resourceclaims
Example output
NAME STATE AGE meta-llama-31-8b-instruct-67bc4-claim-846549f86-1-5c796zw8m allocated,reserved 91s
Troubleshooting#
This section explains some common troubleshooting steps to identify issues with DRA.
Ensure that DRA is enabled in the cluster.
$ kubectl get deviceclasses
The preceding command should not throw an error.
Ensure that CDI is enabled and made the default runtime.
$ kubectl get clusterpolicy cluster-policy -o json | jq -r '.spec.cdi'
Expected output:
{ "default": true, "enabled": true }
Ensure the NVIDIA DRA driver is functional and has 1 resourceslice per node from the
gpu.nvidia.com
driver.$ kubectl get resourceslice
Check the status of the ResourceClaim from the nimservice
status.draResourceStatuses
- it should beallocated,reserved
if successful.$ kubectl get resourceclaim <claim>
Additionally, verify that the ResourceClaim has valid devices allocated.
$ kubectl get resourceclaim <claim> -o json | jq -r '.status.allocation.devices.results'
If the NIMService pods are stuck in Pending/ContainerCreating state, you can describe the pod to get additional information.
$ kubectl describe pod <NIMService-Pod-Name>
ResourceClaim allocation logs can be collected from the NVIDIA DRA driver.
$ kubectl logs -l app.kubernetes.io/name=dra-driver-gpu -n dra-driver -c gpus