DPU Extension Service Management
DPU Extension Services let tenants deploy and manage custom workloads on the DPUs attached to their instances.
NVIDIA Infra Controller (NICo) allows you to do the following with DPU Extension Services:
- Create, update, list and delete reusable service definitions using REST API or admin CLI.
- Deploy one or more service versions to the DPUs attached to an instance.
- Upgrade or remove deployed services by updating the Instance configuration.
- Monitor deployment status of the extension services on the Instance through instance status report.
Currently, the only supported service type is Kubernetes Pod (KubernetesPod). For this service type, the service data field must contain a Kubernetes Pod manifest in YAML format.
Typical Workflow
- Create a DPU Extension Service with a Pod manifest (and optional registry credentials).
- Deploy the service by referencing it in instance config’s extension service config when creating or updating an Instance.
- Monitor deployment status on the Instance until all DPUs report
Running. - Upgrade by creating a new service version, then updating each affected Instance to reference the new version.
- When deleting an instance, DPU extension services will be automatically removed from attached DPUs.
Important: Creating or updating a service definition does not automatically redeploy running Instances. To roll out a new version, update the Instance’s
dpuExtensionServiceDeployments.
Core Concepts
Important Constraints:
- Service names must be case-insensitive unique within a tenant.
- Service type and tenant ownership are immutable after creation.
- An Instance may deploy multiple services but at most one version of each service at a time.
- Instance extension service changes are only accepted while the Instance is in
Readystate.
Managing DPU Extension Services
Create an DPU Extension Service
Create a service definition before deploying it to an Instance.
To create a service definition of type KubernetesPod, prepare a Pod manifest according to Kubernetes Pod requirements
REST API: POST /v2/org/{org}/nico/dpu-extension-service
Admin CLI:
To include private registry credentials, add a credentials object to the request body. See Registry credentials. All three credential flags (--registry-url, --username, --password) are required tenant wants to supply a credential.
List DPU Extension Services
REST API: GET /v2/org/{org}/nico/dpu-extension-service
Optional query parameters: siteId, status, query, includeRelation, pageNumber, pageSize, and orderBy (NAME_ASC, NAME_DESC, STATUS_ASC, STATUS_DESC, CREATED_ASC, CREATED_DESC, UPDATED_ASC, UPDATED_DESC).
Admin CLI:
Optional filters: --type, --name, --tenant-organization-id.
Get an extension service
Retrieve a single service, including its latest version and list of active versions.
REST API: GET /v2/org/{org}/nico/dpu-extension-service/{dpuExtensionServiceId}
Admin CLI:
Get a DPU Extension Service Version
Retrieve the Pod manifest and metadata for a specific version.
REST API: GET /v2/org/{org}/nico/dpu-extension-service/{dpuExtensionServiceId}/version/{version}
Admin CLI:
Update a DPU Extension Service
Changes to data, credentials, or observability create a new immutable version which the latest version number of the extension service is incremented. Changes to name or description only do not create a new version, latest version number stays unchanged. Identical specs are rejected.
REST API: PATCH /v2/org/{org}/nico/dpu-extension-service/{dpuExtensionServiceId}
Optional fields: name, description, credentials, and observability. Omit data to update metadata only.
Admin CLI:
Use --if-version-ctr-match (CLI only) to prevent concurrent update conflicts.
Delete a DPU Extension Service or Version
Deletion succeeds only when no Instance is using the version being deleted. If a version is still deployed, remove it from affected Instances first. When all versions are deleted, the service itself is removed automatically.
REST API:
- Delete all versions:
DELETE /v2/org/{org}/nico/dpu-extension-service/{dpuExtensionServiceId} - Delete a specific version:
DELETE /v2/org/{org}/nico/dpu-extension-service/{dpuExtensionServiceId}/version/{version}
Admin CLI:
Managing Instance DPU Extension Service Deployments
Deployment to Instances is managed through the REST API only.
Deploy DPU Extension Services to an Instance
Reference a service when creating or updating an Instance. If version is omitted, NICo deploys the latest version.
REST API:
- Create Instance:
POST /v2/org/{org}/nico/instance - Batch create:
POST /v2/org/{org}/nico/instance/batch - Update Instance:
PATCH /v2/org/{org}/nico/instance/{instanceId}
Include dpuExtensionServiceDeployments in the request body:
During Instance provisioning, NICo waits for all configured extension services to reach a running state before the Instance becomes ready. If no extension services are configured, this step is skipped.
Upgrade a Deployed DPU Extension Service for an Instance
- Update the service definition to create a new version (see Update an extension service).
- Update each affected Instance to reference the new version in
dpuExtensionServiceDeployments.
The Instance remains in Ready state while DPUs asynchronously apply the new configuration. Monitor progress through the Instance deployment status.
Remove DPU Extension Services from an Instance
Update the Instance and remove the entry from dpuExtensionServiceDeployments. NICo terminates the service on each DPU and reports Terminating, then Terminated, in the Instance status.
During Instance deallocation, NICo waits for all extension services to reach Terminated on every DPU before completing cleanup.
Check Instance DPU Extension Service Deployment Status
REST API: GET /v2/org/{org}/carbide/instance/{instance-id}
The response includes dpuExtensionServiceDeployments with per-service deployment status. See Deployment status values for status definitions.
Find Instances using an extension service
Admin CLI:
Optionally filter by version: --version <version>.
Reference
Kubernetes Pod Requirements
For DPU Extension Service defined with KubernetesPod type, the data field of the DPU Extension Service must be a valid Kubernetes Pod manifest in YAML format.
The pod manifest must not exceed 64 KB and the pod manifest must have following fields:
apiVersionkind: Podmetadata.namespec.containers— a non-empty array
Example:
Registry Credentials
For DPU Extension Service defined with KubernetesPod type, in order for the DPU agent to pull images referenced in the pod manifest, tenant should provider credentials when creating or updating a service:
Credentials are stored in Vault and will not be displayed when tenant queries service definition through REST API or admin CLI. Each field must be non-empty and at most 255 characters. Deleting a service version removes its associated Vault credentials.
The registry URL is used as an image match prefix for kubelet’s image credential provider. NICo matches credentials against the image reference by prefix.
Examples:
- Images
nvcr.io/nvforge/abcdandnvcr.io/nvforge/efgh→ useregistryUrlofnvcr.io/nvforge(no trailing/). - Image
nvcr.io/nv-ngn/sdn/controller:latest→ useregistryUrlofnvcr.io/nv-ngn/sdn.
Multiple services with different credentials: Each service that needs its own credentials must use a distinct registryUrl prefix. If two services share the same prefix (for example, both use nvcr.io/nvforge) with different username/password pairs, kubelet may apply either credential non-deterministically.
DPUs pull images through a site-controller SOCKS5 proxy (socks5://socks.forge:1888). No additional proxy configuration is required in the service definition.
Versioning
Previous versions remain available for deployment until deleted. To roll out a new version, update each Instance’s dpuExtensionServiceDeployments.
Deployment Status Values
NICo aggregates status across all DPUs on an Instance. If any DPU reports Error, the overall status is Error. If any DPU reports Pending, the overall status is Pending. All DPUs must report Running for the overall status to be Running.
Instance Lifecycle Behavior
Provisioning: After network configuration completes, the Instance waits in WaitingForExtensionServicesConfig until all configured extension services are running on every DPU. Hosts with zero DPUs skip this step.
Configuration updates: Extension services can be added, upgraded, or removed while the Instance is in Ready state. Removed services are tracked as terminating until all DPUs confirm Terminated.
Deallocation: During teardown, NICo waits in WaitingForNetworkReconfig until all extension services report Terminated before completing cleanup.