Dynamic Resource Allocation (DRA) Support for NIM#

About Dynamic Resource Allocation (DRA)#

DRA is a built-in Kubernetes feature that simplifies GPU allocation for NIMService, NIMPipelines, and NIMBuild. It replaces traditional device plugins with a more flexible and unified approach. It enables users to define GPU classes, request GPUs based on those classes, and filter them according to specific workload and business requirements. This extensibility makes DRA a powerful tool for managing GPU resources efficiently across diverse NIM use cases.

NIM Operator fully supports the NVIDIA GPU DRA driver and enables you to dynamically allocate GPU resources using a ResourceClaim or ResourceClaimTemplate.

NIM Operator supports DRA resource claims for two types of GPU usage:

  • Full GPU

  • Multi-Instance GPU (MIG)

Additionally, the supported GPU sharing strategies are:

  • Time-slicing

Note

DRA support for NIM is currently a Technology Preview feature on NIM Operator, which means it is not fully supported and is not suitable for deployment in production.

Example Procedure#

Summary#

To use Dynamic Resource Allocation for a NIM, follow these steps:

  1. Complete the prerequisites.

  2. Decide on GPU Usage: Full GPU or MIG.

  3. Create a NIM Cache custom resource.

  4. Configure NIM Service to use the DRA resource with one of the following options:

    1. Auto-generate ResourceClaimTemplate

    2. Use pre-created ResourceClaim or ResourceClaimTemplate

  5. Display DRA resource statuses.

Note

The NIM Cache, ResourceClaim or ResourceClaimTemplate, and NIM Service must all be in the same Kubernetes namespace. The following examples use nim-service as the namespace.

1. Complete the Prerequisites#

  • Ensure you are using Kubernetes cluster version 1.33 or later.

  • For Kubernetes version 1.33.x, enable support for Dynamic Resource Allocation (DRA).

    • The DynamicResourceAllocation feature gate must be enabled on the following components in the Kubernetes cluster:

      • kube-apiserver

      • kube-controller-manager

      • kube-scheduler

      • kubelet

    • Enable the following additional API groups:

      • resource.k8s.io/v1beta1

      • resource.k8s.io/v1beta2

      For more information, refer to Enabling or disabling API groups.

    • Enable the Container Device Interface (CDI) and set it as the default runtime.

    • Install the NVIDIA Container Toolkit and configure it on all GPU nodes in the Kubernetes cluster.

    • Set the accept-nvidia-visible-devices-as-volume-mounts toolkit configuration to work with DRA on all the nodes:

      $ sudo nvidia-ctk config --in-place --set accept-nvidia-visible-devices-as-volume-mounts=true
      
    • Disable device plugins.

      Note

      Before disabling the device plugins, ensure there are no GPU workloads running in the cluster.

      $ kubectl patch clusterpolicy cluster-policy \
          --type=json -p '[{"op":"replace","path":"/spec/devicePlugin/enabled","value":false}]'
      
      $ kubectl get clusterpolicy cluster-policy -o json | jq -r '.spec.devicePlugin.enabled'    
      
    • Deploy the DRA driver helm chart with GPU DRA driver enabled.

      $ helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
          && helm repo update
      
      $ helm install nvidia-dra-driver-gpu nvidia/nvidia-dra-driver-gpu \
          --version="25.3.0" \
          --create-namespace \
          --namespace nvidia-dra-driver-gpu \
          --set gpuResourcesEnabledOverride=true \
          --set nvidiaDriverRoot=/run/nvidia/driver
      
  • If using DRA with MIGs, configure MIG Manager. For more information, refer to GPU Operator with MIG.

Note

The NVIDIA DRA driver deploys a default device class gpu.nvidia.com for GPU resources.

Use the following command to view the device classes:

$ kubectl get deviceclass

NAME             AGE
gpu.nvidia.com   78d

The gpu.nvidia.com DeviceClass represents physical NVIDIA GPUs managed by the NVIDIA GPU DRA driver. You can use this DeviceClass to create ResourceClaim objects to request specific GPUs.

2. Decide on GPU Usage: Full GPU or MIG#

You must decide whether to use Full GPU or Multi-Instance GPU (MIG) for the GPU usage. Generally, Full GPU is used for LLMs, whereas time-slicing or MIG can be used for non-LLMs or smaller NIMs. Using MIG is a more advanced configuration.

If using MIG for the GPU usage, make sure you have enabled MIG and MIG Slicing. Refer to GPU Operator with MIG for more detailed instructions.

The GPU sharing strategy currently supported is time slicing.

GPU time-slicing enables workloads that are scheduled on oversubscribed GPUs to interleave with one another. For more information, refer to Time-Slicing GPUs in Kubernetes

3. Create a NIM Cache Custom Resource#

Note

Refer to Prerequisites for more information on using NIM Cache.

  1. Create a NIM Cache manifest, such as nimcache.yaml, with contents like the following sample manifest:

    apiVersion: apps.nvidia.com/v1alpha1
    kind: NIMCache
    metadata:
      name: meta-llama3-2-1b-instruct
      namespace: nim-service
    spec:
      source:
        ngc:
          modelPuller: nvcr.io/nim/meta/llama-3.2-1b-instruct:1.12.0
          pullSecret: ngc-secret
          authSecret: ngc-api-secret
          model:
            engine: tensorrt_llm
            tensorParallelism: "1"
      storage:
        pvc:
          create: true
          storageClass: ""
          size: "50Gi"
          volumeAccessMode: ReadWriteOnce
    
  2. Apply the manifest:

    $ kubectl apply -n nim-service -f nimcache.yaml
    

4. Configure NIM Service to Use the DRA Resource#

There are two options for how NIM Service can use the DRA resource:

Option A: Auto-generate ResourceClaimTemplate#

Note

Auto-generate only supports ResourceClaimTemplate.

The following NIMService spec.draResources.claimCreationSpec fields can be configured to auto-generate a ResourceClaimTemplate:

Field

Description

Default Value

.devices.attributeSelectors

Defines the criteria that must be satisfied by the device attributes of a device.

None

.devices.attributeSelectors.key

Specifies the name of the device attribute. This is either a qualified name or a simple name. If it is a simple name, then it is assumed to be prefixed with the DRA driver name.

For example, “gpu.nvidia.com/productName” is equivalent to “productName” if the driver name is “gpu.nvidia.com”. Otherwise they are treated as 2 different attributes.

None

.devices.attributeSelectors.op

Specifies the operator to use for comparing the device attribute value. Supported operators are:

  • Equal: The device attribute value must be equal to the value specified in the selector.

  • NotEqual: The device attribute value must not be equal to the value specified in the selector.

  • GreaterThan: The device attribute value must be greater than the value specified in the selector.

  • GreaterThanOrEqual: The device attribute value must be greater than or equal to the value specified in the selector.

  • LessThan: The device attribute value must be less than the value specified in the selector.

  • LessThanOrEqual: The device attribute value must be less than or equal to the value specified in the selector.

Equal

.devices.attributeSelectors.value

Specifies the value to compare against the device attribute.

None

.devices.capacitySelectors

Defines the criteria that must be satisfied by the device capacity of a device.

None

.devices.capacitySelectors.key

Specifies the name of the resource. This is either a qualified name or a simple name. If it is a simple name, then it is assumed to be prefixed with the DRA driver name.

For example, “gpu.nvidia.com/memory” is equivalent to “memory” if the driver name is “gpu.nvidia.com”. Otherwise they are treated as 2 different attributes.

None

.devices.capacitySelectors.op

Specifies the operator to use for comparing against the device capacity. Supported operators are:

  • Equal: The resource quantity value must be equal to the value specified in the selector.

  • NotEqual: The resource quantity value must not be equal to the value specified in the selector.

  • GreaterThan: The resource quantity value must be greater than the value specified in the selector.

  • GreaterThanOrEqual: The resource quantity value must be greater than or equal to the value specified in the selector.

  • LessThan: The resource quantity value must be less than the value specified in the selector.

  • LessThanOrEqual: The resource quantity value must be less than or equal to the value specified in the selector.

Equal

.devices.capacitySelectors.value

Specifies the resource quantity to compare against.

None

.devices.celExpressions

Specifies a list of CEL expressions that must be satisfied by the DRA device.

None

.devices.count

Specifies the number of devices to request.

1

.devices.deviceClassName

Specifies the DeviceClass to inherit configuration and selectors from.

gpu.nvidia.com

.devices.driverName

Specifies the name of the DRA driver providing the capacity information.

Must be a DNS subdomain.

gpu.nvidia.com

.devices.name (required)

Specifies the name of the device request to use in the generated claim spec.

Must be a valid DNS_LABEL.

None

Create a NIM Service

  1. Create a NIM Service manifest, such as nimservice.yaml, with contents like the following sample manifest:

    apiVersion: apps.nvidia.com/v1alpha1
    kind: NIMService
    metadata:
      name: meta-llama3-2-1b-instruct
      namespace: nim-service
    spec:
      image:
        repository: nvcr.io/nim/meta/llama-3.2-1b-instruct
        tag: "1.12.0"
        pullPolicy: IfNotPresent
        pullSecrets:
          - ngc-secret
      authSecret: ngc-api-secret
      storage:
        nimCache:
          name: meta-llama3-2-1b-instruct
          profile: ''
      replicas: 1
      draResources:
      - claimCreationSpec:
          devices:
          - name: gpu
            deviceClassName: gpu.nvidia.com
            driverName: gpu.nvidia.com
            attributeSelectors:
            - key: architecture
              op: Equal
              value:
                stringValue: Ampere
            capacitySelectors:
            - key: memory
              op: GreaterThanOrEqual
              value: 40Gi
      expose:
        service:
          type: ClusterIP
          port: 8000
    
  2. Apply the manifest:

    $ kubectl create -f nimservice.yaml -n nim-service
    

For more sample manifests, refer to config/samples/nim/serving/advanced/dra/auto-creation/.

Option B: Use Pre-Created ResourceClaim or ResourceClaimTemplate#

You can use a ResourceClaim for sharing GPUs. Alternatively, you can use a ResourceClaimTemplate that creates a ResourceClaim on the fly and assigns it to the pod before the pod starts.

Refer to Allocate Devices to Workloads with DRA in the Kubernetes documentation for more detailed information.

Create a ResourceClaim

Refer to the Kubernetes ResourceClaim spec for more detailed configuration options.

  1. Create a ResourceClaim, with contents like the following resourceclaim.yaml sample manifest, for sharing a full GPU:

    apiVersion: resource.k8s.io/v1beta2
    kind: ResourceClaim
    metadata:
        name: gpu-claim
    spec:
        devices:
          requests:
          - exactly:
              allocationMode: ExactCount
              count: 1
              deviceClassName: gpu.nvidia.com
            name: gpu
    
  2. Apply the manifest:

    $ kubectl apply -n nim-service -f resourceclaim.yaml
    

Create a NIM Service

  1. Create a file, such as nimservice.yaml, with contents like the sample manifest. Set spec.draResources.resourceClaimName to gpu-claim, which is the name of ResourceClaim from the preceding resourceclaim.yaml example.

    apiVersion: apps.nvidia.com/v1alpha1
    kind: NIMService
    metadata:
      name: meta-llama-31-8b-instruct
    spec:
      image:
          repository: nvcr.io/nim/meta/llama-3.1-8b-instruct
          tag: 1.8.3
          pullPolicy: IfNotPresent
          pullSecrets:
            - ngc-secret
      authSecret: ngc-api-secret
      storage:
          nimCache:
            name: meta-llama-31-8b-instruct
            profile: ''
      replicas: 1
      draResources:
      - resourceClaimName: gpu-claim
      expose:
          service:
            type: ClusterIP
            port: 8000
    

    Note

    If you only want to use a subset of devices from the claim, use the requests field. That specifies the subset of requests from the ResourceClaim that are made available to the service.

    For example:

    draResources:
    - resourceClaimName: gpu-claim
      requests:
      - gpu-0
    
  2. Apply the manifest:

    $ kubectl create -f nimservice.yaml -n nim-service
    

Create a ResourceClaimTemplate

Refer to the Kubernetes ResourceClaimTemplate spec for more detailed configuration options.

  1. Create a file, such as resourceclaimtemplate.yaml, with contents like the following sample manifest, for sharing a full GPU:

    apiVersion: resource.k8s.io/v1beta2
    kind: ResourceClaimTemplate
    metadata:
      name: gpu-resourceclaimtemplate
      namespace: nim-service
    spec:
      spec:
        devices:
          requests:
          - exactly:
              allocationMode: ExactCount
              count: 1
              deviceClassName: gpu.nvidia.com
            name: gpu
    
  2. Apply the manifest:

    $ kubectl apply -n nim-service -f resourceclaimtemplate.yaml
    

Create a NIM Service

  1. Create a file, such as nimservice.yaml, with contents like the following sample manifest. Set spec.draResources.resourceClaimTemplateName to gpu-claim-template, which is the name of ResourceClaimTemplate from the preceding resourceclaimtemplate.yaml example.

    apiVersion: apps.nvidia.com/v1alpha1
    kind: NIMService
    metadata:
      name: meta-llama3-2-1b-instruct
      namespace: nim-service
    spec:
      image:
        repository: nvcr.io/nim/meta/llama-3.2-1b-instruct
        tag: "1.12.0"
        pullPolicy: IfNotPresent
        pullSecrets:
          - ngc-secret
      authSecret: ngc-api-secret
      storage:
        nimCache:
          name: meta-llama3-2-1b-instruct
          profile: ''
      replicas: 1
      draResources:
      - resourceClaimTemplateName: gpu-resourceclaimtemplate
      expose:
        service:
          type: ClusterIP
          port: 8000
    

    Note

    If you only want to use a subset of devices from the claim, use the requests field. That specifies the subset of requests from the ResourceClaim that are made available to the service.

    For example:

    draResources:
    - resourceClaimTemplateName: gpu-claim-template
      requests:
      - gpu-0
    
  2. Apply the manifest:

    $ kubectl create -f nimservice.yaml -n nim-service
    

For more sample manifests, refer to config/samples/nim/serving/advanced/dra/manual/.

Note

  • You can determine the count based on the model card resource requirements.

  • To change the GPU count, you must create and attach a new ResourceClaim or ResourceClaimTemplate, which triggers a rolling update in the NIM Service.

Displaying Statuses#

To display statuses related to DRA resources, use the following examples.

  1. To view the referenced DRA resources:

    $ kubectl get nimservices meta-llama-31-8b-instruct -o json | jq .spec.draResources
    
    Example output
    [
      {
        "resourceClaimName": "gpu-claim"
      }
    ]
    
  2. To view the statuses of the referenced DRA resources:

    $ kubectl get nimservices meta-llama-31-8b-instruct -o json | jq .status.draResourceStatuses
    
    Example output
    [
      {
        "name": "claim-846549f86-1-5c79578868-0",
        "resourceClaimStatus": {
          "name": "gpu-claim",
          "resourceClaimStatuses": [
            {
              "name": "meta-llama-31-8b-instruct-67bc4-claim-846549f86-1-5c796zw8m",
              "state": "allocated,reserved"
            }
          ]
        }
      }
    ]
    
  3. To view resource claims:

    $ kubectl get resourceclaims
    
    Example output
    NAME                                                           STATE               AGE
    meta-llama-31-8b-instruct-67bc4-claim-846549f86-1-5c796zw8m    allocated,reserved  91s
    
  1. To view the referenced DRA resources:

    $ kubectl get nimservices meta-llama-31-8b-instruct -o json | jq .spec.draResources
    
    Example output
    [
      {
        "resourceClaimTemplateName": "gpu-claim-template"
      }
    ]
    
  2. To view the statuses of the referenced DRA resources:

    $ kubectl get nimservices meta-llama-31-8b-instruct -o json | jq .status.draResourceStatuses
    
    Example output
    [
      {
        "name": "claim-846549f86-1-5c79578868-0",
        "resourceClaimTemplateStatus": {
          "name": "gpu-claim-template",
          "resourceClaimStatuses": [
            {
              "name": "meta-llama-31-8b-instruct-67bc4-claim-846549f86-1-5c796zw8m",
              "state": "allocated,reserved"
            }
          ]
        }
      }
    ]
    
  3. To view resource claims:

    $ kubectl get resourceclaims
    
    Example output
    NAME                                                           STATE               AGE
    meta-llama-31-8b-instruct-67bc4-claim-846549f86-1-5c796zw8m    allocated,reserved  91s
    

Troubleshooting#

This section explains some common troubleshooting steps to identify issues with DRA.

  • Ensure that DRA is enabled in the cluster.

    $ kubectl get deviceclasses
    

    The preceding command should not throw an error.

  • Ensure that CDI is enabled and made the default runtime.

    $ kubectl get clusterpolicy cluster-policy -o json | jq -r '.spec.cdi'
    

    Expected output:

    {
      "default": true,
      "enabled": true
    }
    
  • Ensure the NVIDIA DRA driver is functional and has 1 resourceslice per node from the gpu.nvidia.com driver.

    $ kubectl get resourceslice
    
  • Check the status of the ResourceClaim from the nimservice status.draResourceStatuses - it should be allocated,reserved if successful.

    $ kubectl get resourceclaim <claim>
    
  • Additionally, verify that the ResourceClaim has valid devices allocated.

    $ kubectl get resourceclaim <claim> -o json | jq -r '.status.allocation.devices.results'
    
  • If the NIMService pods are stuck in Pending/ContainerCreating state, you can describe the pod to get additional information.

    $ kubectl describe pod <NIMService-Pod-Name> 
    
  • ResourceClaim allocation logs can be collected from the NVIDIA DRA driver.

    $ kubectl logs -l app.kubernetes.io/name=dra-driver-gpu -n dra-driver -c gpus