Dynamic Resource Allocation (DRA) Support for NIM#

About Dynamic Resource Allocation (DRA)#

DRA is a built-in Kubernetes feature that simplifies GPU allocation for NIMService, NIMPipelines, and NIMBuild. It replaces traditional device plugins with a more flexible and unified approach. It enables users to define GPU classes, request GPUs based on those classes, and filter them according to specific workload and business requirements. This extensibility makes DRA a powerful tool for managing GPU resources efficiently across diverse NIM use cases.

NIM Operator fully supports the NVIDIA GPU DRA driver and enables you to dynamically allocate GPU resources using a ResourceClaim or ResourceClaimTemplate.

NIM Operator supports DRA resource claims for two types of GPU usage:

Full GPU
Multi-Instance GPU (MIG)

Additionally, the supported GPU sharing strategies are:

Time-slicing

Note

DRA support for NIM is currently a Technology Preview feature on NIM Operator, which means it is not fully supported and is not suitable for deployment in production.

Example Procedure#

Summary#

To use Dynamic Resource Allocation for a NIM, follow these steps:

Complete the prerequisites.
Decide on GPU Usage: Full GPU or MIG.
Create a NIM Cache custom resource.
Configure NIM Service to use the DRA resource with one of the following options:
1. Auto-generate ResourceClaimTemplate
2. Use pre-created ResourceClaim or ResourceClaimTemplate
Display DRA resource statuses.

Note

The NIM Cache, ResourceClaim or ResourceClaimTemplate, and NIM Service must all be in the same Kubernetes namespace. The following examples use nim-service as the namespace.

1. Complete the Prerequisites#

Ensure you are using Kubernetes cluster version 1.33 or later.
For Kubernetes version 1.33.x, enable support for Dynamic Resource Allocation (DRA).
- The DynamicResourceAllocation feature gate must be enabled on the following components in the Kubernetes cluster:
  - kube-apiserver
  - kube-controller-manager
  - kube-scheduler
  - kubelet
- Enable the following additional API groups:
  - resource.k8s.io/v1beta1
  - resource.k8s.io/v1beta2
  For more information, refer to Enabling or disabling API groups.
- Enable the Container Device Interface (CDI) and set it as the default runtime.
- Install the NVIDIA Container Toolkit and configure it on all GPU nodes in the Kubernetes cluster.
- Set the accept-nvidia-visible-devices-as-volume-mounts toolkit configuration to work with DRA on all the nodes:
```
$ sudo nvidia-ctk config --in-place --set accept-nvidia-visible-devices-as-volume-mounts=true
```
- Disable device plugins.
  
  Note
  
  Before disabling the device plugins, ensure there are no GPU workloads running in the cluster.
```
$ kubectl patch clusterpolicy cluster-policy \
    --type=json -p '[{"op":"replace","path":"/spec/devicePlugin/enabled","value":false}]'
```
```
$ kubectl get clusterpolicy cluster-policy -o json | jq -r '.spec.devicePlugin.enabled'    
```
- Deploy the DRA driver helm chart with GPU DRA driver enabled.
```
$ helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
    && helm repo update
```
```
$ helm install nvidia-dra-driver-gpu nvidia/nvidia-dra-driver-gpu \
    --version="25.3.0" \
    --create-namespace \
    --namespace nvidia-dra-driver-gpu \
    --set gpuResourcesEnabledOverride=true \
    --set nvidiaDriverRoot=/run/nvidia/driver
```
If using DRA with MIGs, configure MIG Manager. For more information, refer to GPU Operator with MIG.

Note

The NVIDIA DRA driver deploys a default device class gpu.nvidia.com for GPU resources.

Use the following command to view the device classes:

$ kubectl get deviceclass

NAME             AGE
gpu.nvidia.com   78d

The gpu.nvidia.com DeviceClass represents physical NVIDIA GPUs managed by the NVIDIA GPU DRA driver. You can use this DeviceClass to create ResourceClaim objects to request specific GPUs.

2. Decide on GPU Usage: Full GPU or MIG#

You must decide whether to use Full GPU or Multi-Instance GPU (MIG) for the GPU usage. Generally, Full GPU is used for LLMs, whereas time-slicing or MIG can be used for non-LLMs or smaller NIMs. Using MIG is a more advanced configuration.

If using MIG for the GPU usage, make sure you have enabled MIG and MIG Slicing. Refer to GPU Operator with MIG for more detailed instructions.

The GPU sharing strategy currently supported is time slicing.

GPU time-slicing enables workloads that are scheduled on oversubscribed GPUs to interleave with one another. For more information, refer to Time-Slicing GPUs in Kubernetes

3. Create a NIM Cache Custom Resource#

Note

Refer to Prerequisites for more information on using NIM Cache.

Create a NIM Cache manifest, such as nimcache.yaml, with contents like the following sample manifest:

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMCache
metadata:
  name: meta-llama3-2-1b-instruct
  namespace: nim-service
spec:
  source:
    ngc:
      modelPuller: nvcr.io/nim/meta/llama-3.2-1b-instruct:1.12.0
      pullSecret: ngc-secret
      authSecret: ngc-api-secret
      model:
        engine: tensorrt_llm
        tensorParallelism: "1"
  storage:
    pvc:
      create: true
      storageClass: ""
      size: "50Gi"
      volumeAccessMode: ReadWriteOnce

Apply the manifest:

$ kubectl apply -n nim-service -f nimcache.yaml

4. Configure NIM Service to Use the DRA Resource#

There are two options for how NIM Service can use the DRA resource:

Option A: Auto-generate ResourceClaimTemplate (recommended). NIM Service automatically creates a ResourceClaimTemplate using the fields specified in spec.draResources.claimCreationSpec.
Option B: Use pre-created ResourceClaim or ResourceClaimTemplate. First create a ResourceClaim or ResourceClaimTemplate, then configure NIM Service to use that ResourceClaim or ResourceClaimTemplate.

Option A: Auto-generate ResourceClaimTemplate#

Note

Auto-generate only supports ResourceClaimTemplate.

The following NIMService spec.draResources.claimCreationSpec fields can be configured to auto-generate a ResourceClaimTemplate:

Field	Description	Default Value
`.devices.attributeSelectors`	Defines the criteria that must be satisfied by the device attributes of a device.	None
`.devices.attributeSelectors.key`	Specifies the name of the device attribute. This is either a qualified name or a simple name. If it is a simple name, then it is assumed to be prefixed with the DRA driver name. For example, “gpu.nvidia.com/productName” is equivalent to “productName” if the driver name is “gpu.nvidia.com”. Otherwise they are treated as 2 different attributes.	None
`.devices.attributeSelectors.op`	Specifies the operator to use for comparing the device attribute value. Supported operators are: Equal: The device attribute value must be equal to the value specified in the selector. NotEqual: The device attribute value must not be equal to the value specified in the selector. GreaterThan: The device attribute value must be greater than the value specified in the selector. GreaterThanOrEqual: The device attribute value must be greater than or equal to the value specified in the selector. LessThan: The device attribute value must be less than the value specified in the selector. LessThanOrEqual: The device attribute value must be less than or equal to the value specified in the selector.	`Equal`
`.devices.attributeSelectors.value`	Specifies the value to compare against the device attribute.	None
`.devices.capacitySelectors`	Defines the criteria that must be satisfied by the device capacity of a device.	None
`.devices.capacitySelectors.key`	Specifies the name of the resource. This is either a qualified name or a simple name. If it is a simple name, then it is assumed to be prefixed with the DRA driver name. For example, “gpu.nvidia.com/memory” is equivalent to “memory” if the driver name is “gpu.nvidia.com”. Otherwise they are treated as 2 different attributes.	None
`.devices.capacitySelectors.op`	Specifies the operator to use for comparing against the device capacity. Supported operators are: Equal: The resource quantity value must be equal to the value specified in the selector. NotEqual: The resource quantity value must not be equal to the value specified in the selector. GreaterThan: The resource quantity value must be greater than the value specified in the selector. GreaterThanOrEqual: The resource quantity value must be greater than or equal to the value specified in the selector. LessThan: The resource quantity value must be less than the value specified in the selector. LessThanOrEqual: The resource quantity value must be less than or equal to the value specified in the selector.	`Equal`
`.devices.capacitySelectors.value`	Specifies the resource quantity to compare against.	None
`.devices.celExpressions`	Specifies a list of CEL expressions that must be satisfied by the DRA device.	None
`.devices.count`	Specifies the number of devices to request.	`1`
`.devices.deviceClassName`	Specifies the DeviceClass to inherit configuration and selectors from.	`gpu.nvidia.com`
`.devices.driverName`	Specifies the name of the DRA driver providing the capacity information. Must be a DNS subdomain.	`gpu.nvidia.com`
`.devices.name` (required)	Specifies the name of the device request to use in the generated claim spec. Must be a valid DNS_LABEL.	None

Create a NIM Service

Create a NIM Service manifest, such as nimservice.yaml, with contents like the following sample manifest:

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: meta-llama3-2-1b-instruct
  namespace: nim-service
spec:
  image:
    repository: nvcr.io/nim/meta/llama-3.2-1b-instruct
    tag: "1.12.0"
    pullPolicy: IfNotPresent
    pullSecrets:
      - ngc-secret
  authSecret: ngc-api-secret
  storage:
    nimCache:
      name: meta-llama3-2-1b-instruct
      profile: ''
  replicas: 1
  draResources:
  - claimCreationSpec:
      devices:
      - name: gpu
        deviceClassName: gpu.nvidia.com
        driverName: gpu.nvidia.com
        attributeSelectors:
        - key: architecture
          op: Equal
          value:
            stringValue: Ampere
        capacitySelectors:
        - key: memory
          op: GreaterThanOrEqual
          value: 40Gi
  expose:
    service:
      type: ClusterIP
      port: 8000

Apply the manifest:

$ kubectl create -f nimservice.yaml -n nim-service

For more sample manifests, refer to config/samples/nim/serving/advanced/dra/auto-creation/.

Option B: Use Pre-Created ResourceClaim or ResourceClaimTemplate#

You can use a ResourceClaim for sharing GPUs. Alternatively, you can use a ResourceClaimTemplate that creates a ResourceClaim on the fly and assigns it to the pod before the pod starts.

Refer to Allocate Devices to Workloads with DRA in the Kubernetes documentation for more detailed information.

ResourceClaim

Create a ResourceClaim

Refer to the Kubernetes ResourceClaim spec for more detailed configuration options.

Create a ResourceClaim, with contents like the following resourceclaim.yaml sample manifest, for sharing a full GPU:

apiVersion: resource.k8s.io/v1beta2
kind: ResourceClaim
metadata:
    name: gpu-claim
spec:
    devices:
      requests:
      - exactly:
          allocationMode: ExactCount
          count: 1
          deviceClassName: gpu.nvidia.com
        name: gpu

Apply the manifest:

$ kubectl apply -n nim-service -f resourceclaim.yaml

Create a NIM Service

Create a file, such as nimservice.yaml, with contents like the sample manifest. Set spec.draResources.resourceClaimName to gpu-claim, which is the name of ResourceClaim from the preceding resourceclaim.yaml example.

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: meta-llama-31-8b-instruct
spec:
  image:
      repository: nvcr.io/nim/meta/llama-3.1-8b-instruct
      tag: 1.8.3
      pullPolicy: IfNotPresent
      pullSecrets:
        - ngc-secret
  authSecret: ngc-api-secret
  storage:
      nimCache:
        name: meta-llama-31-8b-instruct
        profile: ''
  replicas: 1
  draResources:
  - resourceClaimName: gpu-claim
  expose:
      service:
        type: ClusterIP
        port: 8000

Note

If you only want to use a subset of devices from the claim, use the requests field. That specifies the subset of requests from the ResourceClaim that are made available to the service.

For example:

draResources:
- resourceClaimName: gpu-claim
  requests:
  - gpu-0

Apply the manifest:

$ kubectl create -f nimservice.yaml -n nim-service

ResourceClaimTemplate

Create a ResourceClaimTemplate

Refer to the Kubernetes ResourceClaimTemplate spec for more detailed configuration options.

Create a file, such as resourceclaimtemplate.yaml, with contents like the following sample manifest, for sharing a full GPU:

apiVersion: resource.k8s.io/v1beta2
kind: ResourceClaimTemplate
metadata:
  name: gpu-resourceclaimtemplate
  namespace: nim-service
spec:
  spec:
    devices:
      requests:
      - exactly:
          allocationMode: ExactCount
          count: 1
          deviceClassName: gpu.nvidia.com
        name: gpu

Apply the manifest:

$ kubectl apply -n nim-service -f resourceclaimtemplate.yaml

Create a NIM Service

Create a file, such as nimservice.yaml, with contents like the following sample manifest. Set spec.draResources.resourceClaimTemplateName to gpu-claim-template, which is the name of ResourceClaimTemplate from the preceding resourceclaimtemplate.yaml example.

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: meta-llama3-2-1b-instruct
  namespace: nim-service
spec:
  image:
    repository: nvcr.io/nim/meta/llama-3.2-1b-instruct
    tag: "1.12.0"
    pullPolicy: IfNotPresent
    pullSecrets:
      - ngc-secret
  authSecret: ngc-api-secret
  storage:
    nimCache:
      name: meta-llama3-2-1b-instruct
      profile: ''
  replicas: 1
  draResources:
  - resourceClaimTemplateName: gpu-resourceclaimtemplate
  expose:
    service:
      type: ClusterIP
      port: 8000

Note

If you only want to use a subset of devices from the claim, use the requests field. That specifies the subset of requests from the ResourceClaim that are made available to the service.

For example:

draResources:
- resourceClaimTemplateName: gpu-claim-template
  requests:
  - gpu-0

Apply the manifest:

$ kubectl create -f nimservice.yaml -n nim-service

For more sample manifests, refer to config/samples/nim/serving/advanced/dra/manual/.

Note

You can determine the count based on the model card resource requirements.
To change the GPU count, you must create and attach a new ResourceClaim or ResourceClaimTemplate, which triggers a rolling update in the NIM Service.

Displaying Statuses#

To display statuses related to DRA resources, use the following examples.

For ResourceClaims

To view the referenced DRA resources:

$ kubectl get nimservices meta-llama-31-8b-instruct -o json | jq .spec.draResources

To view the statuses of the referenced DRA resources:

$ kubectl get nimservices meta-llama-31-8b-instruct -o json | jq .status.draResourceStatuses

To view resource claims:

$ kubectl get resourceclaims

For ResourceClaimTemplates

To view the referenced DRA resources:

$ kubectl get nimservices meta-llama-31-8b-instruct -o json | jq .spec.draResources

To view the statuses of the referenced DRA resources:

$ kubectl get nimservices meta-llama-31-8b-instruct -o json | jq .status.draResourceStatuses

To view resource claims:

$ kubectl get resourceclaims

Troubleshooting#

This section explains some common troubleshooting steps to identify issues with DRA.

Ensure that DRA is enabled in the cluster.
```
$ kubectl get deviceclasses
```
The preceding command should not throw an error.

Ensure that CDI is enabled and made the default runtime.

$ kubectl get clusterpolicy cluster-policy -o json | jq -r '.spec.cdi'

Expected output:

{
  "default": true,
  "enabled": true
}

Ensure the NVIDIA DRA driver is functional and has 1 resourceslice per node from the gpu.nvidia.com driver.
```
$ kubectl get resourceslice
```
Check the status of the ResourceClaim from the nimservice status.draResourceStatuses - it should be allocated,reserved if successful.
```
$ kubectl get resourceclaim <claim>
```

Additionally, verify that the ResourceClaim has valid devices allocated.

$ kubectl get resourceclaim <claim> -o json | jq -r '.status.allocation.devices.results'

If the NIMService pods are stuck in Pending/ContainerCreating state, you can describe the pod to get additional information.
```
$ kubectl describe pod <NIMService-Pod-Name> 
```

ResourceClaim allocation logs can be collected from the NVIDIA DRA driver.

$ kubectl logs -l app.kubernetes.io/name=dra-driver-gpu -n dra-driver -c gpus