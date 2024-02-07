Network Operator Application Notes 23.10.0 - Sphinx Test
This troubleshooting guide is intended for people who are responsible for maintaining and administering DPU provisioning.

Deleting BareMetalHost(BMH) component while in “provisioning error” state gets stuck

Symptoms

When you try to provision DPU with an error provisioning script, this will set state of BMH to “provisioning error”. The de-provisioning phase may get stuck in deleting BMH.

Resolution

Applying the following patch to skip the de-provisioning phase when deleting the DPU:

kubectl patch bmh <baremetalhost-cr-name> -n <universe> -p '{"spec":{"automatedCleaningMode":"disabled"}}' --type="merge"
kubectl delete dpu <dpu-cr-name> -n universe

Note

Skipping the de-provisioning phase will leave the OS and containers and daemons lingering in the DPU.

Describe cluster appears SettingProviderIDOnNodeFailed message

Symptoms

During provisioning SettingProviderIDOnNodeFailed error message appears if one runs the clusterctl describe on DCM node.

root@dcm05:/opt/regression# clusterctl describe cluster dcm-cluster -n universe
NAME                                                             READY  SEVERITY  REASON                         SINCE  MESSAGE
Cluster/dcm-cluster                                              True                                            55m
ClusterInfrastructure - Metal3Cluster/dcm-cluster                True                                            55m
ControlPlane - UniverseControlPlane/dcm-cluster-control-plane    True                                            55m
Workers
    Other
    Machine/hpc-cloud05-bf1                                      False  Error     SettingProviderIDOnNodeFailed  10m    1 of 2 completed

Cause

This is a normal behavior in CAPI. This message is automatically removed when cloud-init is re-registered.

Describe cluster appears AssociatedBMHFailure message

Symptoms

During provisioning AssociatedBMHFailure error message appears if one runs the clusterctl describe on DCM node.

Cause

This is a normal behavior in CAPI. This message is automatically removed when BareMetalHost’s status become ready.

CR left after provisioning components uninstalled

Symptoms

After uninstall the provisioning components through helm uninstall command, the DPU, BareMetalHost, Metal3Machine CRs are left in the cluster.

Cause

These CRs are created by cloud admin or controllers. In this release, the helm uninstall command does not delete the CRs that were not created by a helm install.
