DOCA Platform Framework (DPF) Documentation v25.10.0

DPFOperatorConfig

The DPFOperatorConfig controls how DPF operates in your Kubernetes cluster. This guide explains the major configuration options. When the config is applied, the DPF Operator will deploy all necessary components and configure them according to the configuration.

This basic config example enables the Kamaji cluster manager and sets the PVC name of the BFB PVC which is necessary for the provisioning controller to download the bf-bundle.

Note

In the current implementation the DPFOperatorConfig resource is a singleton. This means that only one instance of this resource can exist in the cluster. If you try to create a second instance, the controllers will not work as expected.

You can find the full API documentation in the API Reference.

Copy
Copied!
            

apiVersion: operator.doca-platform.nvidia.com/v1alpha1 kind: DPFOperatorConfig metadata: name: dpf-operator-config spec: staticClusterManager: disable: true kamajiClusterManager: disable: false provisioningController: bfbPVCName: bfb-pvc

We can verify if the configuration is applied correctly by checking the status of the DPFOperatorConfig resource.

Copy
Copied!
            

$ kubectl -n dpf-operator-system get dpfoperatorconfig NAME READY PHASE AGE dpfoperatorconfig True Success 1h

or via dpfctl

Copy
Copied!
            

$ kubectl -n dpf-operator-system exec deployment/dpf-operator-controller-manager -- /dpfctl describe all NAME NAMESPACE STATUS REASON SINCE MESSAGE DPFOperatorConfig/dpfoperatorconfig dpf-operator-system ├─Ready True Success 1h ├─ImagePullSecretsReconciled True Success 1h ├─SystemComponentsReady True Success 1h └─SystemComponentsReconciled True Success 1h

Networking

There are networking options that can be configured. The MTU for the control plane and high-speed interfaces can be configured. The default value is set to 1500, however it can be adjusted if required.

Copy
Copied!
            

spec: networking: controlPlaneMTU: 1500 # Management network MTU (range: 1280-9216, default: 1500) highSpeedMTU: 1500 # High-speed interface MTU (range: 1280-9216, default: 1500)


Image Pull Secrets

Specify secrets for pulling container images. This is only necessary if your container registry requires authentication. If you are using the public GHCR registry, which is the default, you don't need to configure this.

Copy
Copied!
            

spec: imagePullSecrets: - "my-registry-secret" - "another-secret"


Resources

All system components deployed by the DPF Operator support standard Kubernetes resource requests and limits. Resources can be configured per component at the container level. Components may have multiple containers with different resource requirements that can be configured independently.

Below is an example of configuring resources for the SFC Controller component:

Copy
Copied!
            

spec: sfcController: controller: resources: requests: cpu: 6 memory: 2Gi limits: cpu: 8 memory: 4Gi

This pattern applies to all components listed in the Optional Component Configurations section below.

For production deployments, it's recommended to set appropriate resource limits based on your cluster's workload.

Optional Component Configurations

The following components can be configured to enable/disable features or specify a different container image.

By default, all components are enabled with preconfigured images, and changes are usually only needed for development, testing, or specific deployments.

Copy
Copied!
            

spec: cniInstaller: { } dpuDetector: { } dpuServiceController: { } flannel: { } kamajiClusterManager: { } multus: { } nvipam: { } ovsCNI: { } provisioningController: { } serviceSetController: { } sfcController: { } sriovDevicePlugin: { } staticClusterManager: { }

To disable a component or override its container image, use the following configuration:

Copy
Copied!
            

spec: sriovDevicePlugin: disable: true dpuDetector: daemon: image: "my-registry/my-dpu-detector:latest"

Warning

Deprecated: Setting the image at component level (e.g., spec.dpuDetector.image) is deprecated. Use the sub-component specific image field instead (e.g., spec.dpuDetector.daemon.image).

For a detailed description of each component and its available configuration options, see

the API Reference.

DPU Service Controller Configuration options

  • spec.dpuServiceController.disableDPUReadyTaints: When set to true, disables the automatic tainting of DPU nodes when they're not ready.

Copy
Copied!
            

spec: dpuServiceController: disableDPUReadyTaints: true


Flannel Configuration Options

  • spec.flannel.podCIDR: CIDR range for pod networking when using Flannel CNI.

Copy
Copied!
            

spec: flannel: podCIDR: "10.244.0.0/16"


Component Deployment Configuration

Several components support additional deployment configuration options:

  • helmChart: Override the Helm chart repository/version for the component

Copy
Copied!
            

spec: multus: helmChart: "custom-repo/multus:v1.0.0"


SFC Controller Configuration Options

  • spec.sfcController.SecureFlowDeletionTimeout: Used to control the secure flow deletion feature.

    The default value is 0, which means that the feature is disabled.

    When set with a valid duration value, indicating the API server unavailability threshold, SFC controller will delete all openflow flows to prevent unintended packet leaks, if API server is unavailable for more than the specified duration.

    Value must be in units accepted by Go time.ParseDuration https://golang.org/pkg/time/#ParseDuration.

Copy
Copied!
            

spec: sfcController: SecureFlowDeletionTimeout: 5m


Provisioning Controller Configuration Options

  • spec.provisioningController.bfbPVCName: (Required) Name of the PVC containing the BFB (BF Bundle) for provisioning DPUs.

  • spec.provisioningController.maxDPUParallelInstallations: Controls the maximum number of DPUs that can be provisioned concurrently. The default value is 50. The value must be at least 1.

  • spec.provisioningController.maxUnavailableDPUNodes: Maximum number of DPU nodes that can be unavailable during updates. The provisioning controller interacts with the maintenance-operator to implement the drain node effect. The number of nodes that can be applied node effect simultaneously is determined by MaxUnavailableDPUNodes in dpfoperatorconfig and MaxParallelOperations in the NodeMaintenance-operator configuration. NodeMainteanceOperator has higher priority than what is defined in the DPFOperatorConfig. The default value of DPFOperatorConfig.MaxUnavailableDPUNodes is 50. For the default MaintenanceOperatorConfig values see instructions in helm prerequisites.

The maxDPUParallelInstallations and maxUnavailableDPUNodes options can be configured together and can be combined with maxParallelOperations and maxUnavailable in Nvidia NodeMaintenance-operator configuration. Below are some examples to show the expected behaviour.

maxDPUParallelInstallations in DPFOperatorconfig

maxUnavailableDPUNodes in DPFOperatorconfig

maxParallelOperations in Nvidia NodeMaintenanceConfig

maxUnavailable in Nvidia NodeMaintenanceConfig

max number of DPUs in provisioning

max number of Nodes under node effect in NodeMaintenanceOperator

5

1

10

5

up to 5 DPUs provisioning in parallel

up to 1 node under node effect

1

5

10

10

up to 1 DPU provisioning

up to 1 node under node effect

5

5

1

5

up to 5 DPUs provisioning in parallel

up to 1 node under node effect

5

5

10

2

up to 5 DPUs provisioning in parallel

up to 2 node under node effect

  • spec.provisioningController.bfCFGTemplateConfigMap: Name of ConfigMap containing bf-cfg template for DPU configuration.

  • spec.provisioningController.customCASecretName: Name of Secret containing custom CA certificates for secure communication.

  • spec.provisioningController.dmsTimeout: Timeout in seconds for DMS (DPU Management Service) operations.

  • spec.provisioningController.multiDPUOperationsSyncWaitTime: Wait time for synchronizing operations across multiple DPUs. Value must be in units accepted by Go time.ParseDuration https://golang.org/pkg/time/#ParseDuration.

  • spec.provisioningController.registry: Configuration for the container registry used during provisioning.

    • address: Registry address

    • port: Registry port

  • spec.provisioningController.installInterface: Method for installing DPU firmware. Choose one:

    • installViaHostAgent: Install via host agent

    • installViaGNOI: Install via gNOI protocol

    • installViaRedfish: Install via Redfish API with additional options:

      • bfbRegistry.disable: Disable the BFB registry

      • bfbRegistry.port: Port for BFB registry

      • bfbRegistryAddress: Address of BFB registry

      • skipDpuNodeDiscovery: Skip automatic DPU node discovery

Copy
Copied!
            

spec: provisioningController: bfbPVCName: bfb-pvc maxDPUParallelInstallations: 25 # Limit concurrent provisioning to 25 DPUs maxUnavailableDPUNodes: 5 dmsTimeout: 600 multiDPUOperationsSyncWaitTime: 30s customCASecretName: my-ca-secret registry: address: "registry.example.com" port: 5000 installInterface: installViaRedfish: bfbRegistry: port: 8080 skipDpuNodeDiscovery: false

Advanced Overrides

The overrides section allows customization of system-level paths and settings. These are typically only needed for non-standard deployments or testing scenarios.

Copy
Copied!
            

spec: overrides: # Pause reconciliation of the DPFOperatorConfig paused: false   # Kubernetes API server configuration kubernetesAPIServerVIP: "192.168.1.100" kubernetesAPIServerPort: 6443   # DPU filesystem paths for CNI dpuCNIPath: "/etc/cni/net.d" dpuCNIBinPath: "/opt/cni/bin"   # DPU OpenVSwitch paths dpuOpenvSwitchBinPath: "/usr/bin" dpuOpenvSwitchRunPath: "/var/run/openvswitch" dpuOpenvSwitchSystemSharedPath: "/usr/share/openvswitch" dpuOpenvSwitchSystemSharedLib64Path: "/usr/lib64"   # Flannel-specific overrides flannelSkipCNIConfigInstallation: false

Override Options

  • paused: When set to true, pauses reconciliation of the DPFOperatorConfig resource.

  • kubernetesAPIServerVIP: Override the Kubernetes API server virtual IP address.

  • kubernetesAPIServerPort: Override the Kubernetes API server port (default: 6443).

  • dpuCNIPath: Path to CNI configuration directory on DPU nodes.

  • dpuCNIBinPath: Path to CNI binaries on DPU nodes.

  • dpuOpenvSwitchBinPath: Path to OpenvSwitch binaries on DPU nodes.

  • dpuOpenvSwitchRunPath: Path to OpenvSwitch runtime directory on DPU nodes.

  • dpuOpenvSwitchSystemSharedPath: Path to OpenvSwitch shared directory on DPU nodes.

  • dpuOpenvSwitchSystemSharedLib64Path: Path to OpenvSwitch 64-bit libraries on DPU nodes.

  • flannelSkipCNIConfigInstallation: Skip automatic CNI configuration installation for Flannel.

© Copyright 2025, NVIDIA. Last updated on Dec 23, 2025