DPFOperatorConfig
The DPFOperatorConfig controls how DPF operates in your Kubernetes cluster. This guide explains the major configuration options. When the config is applied, the DPF Operator will deploy all necessary components and configure them according to the configuration.
This basic config example enables the Kamaji cluster manager and sets the PVC name of the BFB PVC which is necessary for the provisioning controller to download the bf-bundle.
In the current implementation the DPFOperatorConfig resource is a singleton. This means that only one instance of this resource can exist in the cluster. If you try to create a second instance, the controllers will not work as expected.
You can find the full API documentation in the API Reference.
apiVersion: operator.doca-platform.nvidia.com/v1alpha1
kind: DPFOperatorConfig
metadata:
name: dpf-operator-config
spec:
staticClusterManager:
disable: true
kamajiClusterManager:
disable: false
provisioningController:
bfbPVCName: bfb-pvc
We can verify if the configuration is applied correctly by checking the status of the DPFOperatorConfig resource.
$ kubectl -n dpf-operator-system get dpfoperatorconfig
NAME READY PHASE AGE
dpfoperatorconfig True Success 1h
or via dpfctl
$ kubectl -n dpf-operator-system exec deployment/dpf-operator-controller-manager -- /dpfctl describe all
NAME NAMESPACE STATUS REASON SINCE MESSAGE
DPFOperatorConfig/dpfoperatorconfig dpf-operator-system
├─Ready True Success 1h
├─ImagePullSecretsReconciled True Success 1h
├─SystemComponentsReady True Success 1h
└─SystemComponentsReconciled True Success 1h
Networking
There are networking options that can be configured. The MTU for the control plane and high-speed interfaces can be configured. The default value is set to 1500, however it can be adjusted if required.
spec:
networking:
controlPlaneMTU: 1500 # Management network MTU (range: 1280-9216, default: 1500)
highSpeedMTU: 1500 # High-speed interface MTU (range: 1280-9216, default: 1500)
Image Pull Secrets
Specify secrets for pulling container images. This is only necessary if your container registry requires authentication. If you are using the public GHCR registry, which is the default, you don't need to configure this.
spec:
imagePullSecrets:
- "my-registry-secret"
- "another-secret"
Resources
All system components deployed by the DPF Operator support standard Kubernetes resource requests and limits. Resources can be configured per component at the container level. Components may have multiple containers with different resource requirements that can be configured independently.
Below is an example of configuring resources for the SFC Controller component:
spec:
sfcController:
controller:
resources:
requests:
cpu: 6
memory: 2Gi
limits:
cpu: 8
memory: 4Gi
This pattern applies to all components listed in the Optional Component Configurations section below.
For production deployments, it's recommended to set appropriate resource limits based on your cluster's workload.
Optional Component Configurations
The following components can be configured to enable/disable features or specify a different container image.
By default, all components are enabled with preconfigured images, and changes are usually only needed for development, testing, or specific deployments.
spec:
cniInstaller: { }
dpuDetector: { }
dpuServiceController: { }
flannel: { }
kamajiClusterManager: { }
multus: { }
nvipam: { }
ovsCNI: { }
provisioningController: { }
serviceSetController: { }
sfcController: { }
sriovDevicePlugin: { }
staticClusterManager: { }
To disable a component or override its container image, use the following configuration:
spec:
sriovDevicePlugin:
disable: true
dpuDetector:
daemon:
image: "my-registry/my-dpu-detector:latest"
Deprecated: Setting the image at component level (e.g., spec.dpuDetector.image) is deprecated. Use the sub-component specific image field instead (e.g., spec.dpuDetector.daemon.image).
For a detailed description of each component and its available configuration options, see
the API Reference.
DPU Service Controller Configuration options
spec.dpuServiceController.disableDPUReadyTaints: When set to true, disables the automatic tainting of DPU nodes when they're not ready.
spec:
dpuServiceController:
disableDPUReadyTaints: true
Flannel Configuration Options
spec.flannel.podCIDR: CIDR range for pod networking when using Flannel CNI.
spec:
flannel:
podCIDR: "10.244.0.0/16"
Component Deployment Configuration
Several components support additional deployment configuration options:
helmChart: Override the Helm chart repository/version for the component
spec:
multus:
helmChart: "custom-repo/multus:v1.0.0"
SFC Controller Configuration Options
spec.sfcController.SecureFlowDeletionTimeout: Used to control the secure flow deletion feature.The default value is 0, which means that the feature is disabled.
When set with a valid duration value, indicating the API server unavailability threshold, SFC controller will delete all openflow flows to prevent unintended packet leaks, if API server is unavailable for more than the specified duration.
Value must be in units accepted by Go time.ParseDuration https://golang.org/pkg/time/#ParseDuration.
spec:
sfcController:
SecureFlowDeletionTimeout: 5m
Provisioning Controller Configuration Options
spec.provisioningController.bfbPVCName: (Required) Name of the PVC containing the BFB (BF Bundle) for provisioning DPUs.spec.provisioningController.maxDPUParallelInstallations: Controls the maximum number of DPUs that can be provisioned concurrently. The default value is 50. The value must be at least 1.spec.provisioningController.maxUnavailableDPUNodes: Maximum number of DPU nodes that can be unavailable during updates. The provisioning controller interacts with the maintenance-operator to implement the drain node effect. The number of nodes that can be applied node effect simultaneously is determined by MaxUnavailableDPUNodes in dpfoperatorconfig and MaxParallelOperations in the NodeMaintenance-operator configuration. NodeMainteanceOperator has higher priority than what is defined in the DPFOperatorConfig. The default value of DPFOperatorConfig.MaxUnavailableDPUNodes is 50. For the default MaintenanceOperatorConfig values see instructions in helm prerequisites.
The maxDPUParallelInstallations and maxUnavailableDPUNodes options can be configured together and can be combined with maxParallelOperations and maxUnavailable in Nvidia NodeMaintenance-operator configuration. Below are some examples to show the expected behaviour.
maxDPUParallelInstallations in DPFOperatorconfig | maxUnavailableDPUNodes in DPFOperatorconfig | maxParallelOperations in Nvidia NodeMaintenanceConfig | maxUnavailable in Nvidia NodeMaintenanceConfig | max number of DPUs in provisioning | max number of Nodes under node effect in NodeMaintenanceOperator |
5 | 1 | 10 | 5 | up to 5 DPUs provisioning in parallel | up to 1 node under node effect |
1 | 5 | 10 | 10 | up to 1 DPU provisioning | up to 1 node under node effect |
5 | 5 | 1 | 5 | up to 5 DPUs provisioning in parallel | up to 1 node under node effect |
5 | 5 | 10 | 2 | up to 5 DPUs provisioning in parallel | up to 2 node under node effect |
spec.provisioningController.bfCFGTemplateConfigMap: Name of ConfigMap containing bf-cfg template for DPU configuration.spec.provisioningController.customCASecretName: Name of Secret containing custom CA certificates for secure communication.spec.provisioningController.dmsTimeout: Timeout in seconds for DMS (DPU Management Service) operations.spec.provisioningController.multiDPUOperationsSyncWaitTime: Wait time for synchronizing operations across multiple DPUs. Value must be in units accepted by Go time.ParseDuration https://golang.org/pkg/time/#ParseDuration.spec.provisioningController.registry: Configuration for the container registry used during provisioning.address: Registry addressport: Registry port
spec.provisioningController.installInterface: Method for installing DPU firmware. Choose one:installViaHostAgent: Install via host agentinstallViaGNOI: Install via gNOI protocolinstallViaRedfish: Install via Redfish API with additional options:bfbRegistry.disable: Disable the BFB registrybfbRegistry.port: Port for BFB registrybfbRegistryAddress: Address of BFB registryskipDpuNodeDiscovery: Skip automatic DPU node discovery
spec:
provisioningController:
bfbPVCName: bfb-pvc
maxDPUParallelInstallations: 25 # Limit concurrent provisioning to 25 DPUs
maxUnavailableDPUNodes: 5
dmsTimeout: 600
multiDPUOperationsSyncWaitTime: 30s
customCASecretName: my-ca-secret
registry:
address: "registry.example.com"
port: 5000
installInterface:
installViaRedfish:
bfbRegistry:
port: 8080
skipDpuNodeDiscovery: false
Advanced Overrides
The overrides section allows customization of system-level paths and settings. These are typically only needed for non-standard deployments or testing scenarios.
spec:
overrides:
# Pause reconciliation of the DPFOperatorConfig
paused: false
# Kubernetes API server configuration
kubernetesAPIServerVIP: "192.168.1.100"
kubernetesAPIServerPort: 6443
# DPU filesystem paths for CNI
dpuCNIPath: "/etc/cni/net.d"
dpuCNIBinPath: "/opt/cni/bin"
# DPU OpenVSwitch paths
dpuOpenvSwitchBinPath: "/usr/bin"
dpuOpenvSwitchRunPath: "/var/run/openvswitch"
dpuOpenvSwitchSystemSharedPath: "/usr/share/openvswitch"
dpuOpenvSwitchSystemSharedLib64Path: "/usr/lib64"
# Flannel-specific overrides
flannelSkipCNIConfigInstallation: false
Override Options
paused: When set to true, pauses reconciliation of the DPFOperatorConfig resource.kubernetesAPIServerVIP: Override the Kubernetes API server virtual IP address.kubernetesAPIServerPort: Override the Kubernetes API server port (default: 6443).dpuCNIPath: Path to CNI configuration directory on DPU nodes.dpuCNIBinPath: Path to CNI binaries on DPU nodes.dpuOpenvSwitchBinPath: Path to OpenvSwitch binaries on DPU nodes.dpuOpenvSwitchRunPath: Path to OpenvSwitch runtime directory on DPU nodes.dpuOpenvSwitchSystemSharedPath: Path to OpenvSwitch shared directory on DPU nodes.dpuOpenvSwitchSystemSharedLib64Path: Path to OpenvSwitch 64-bit libraries on DPU nodes.flannelSkipCNIConfigInstallation: Skip automatic CNI configuration installation for Flannel.