bmc-health
dpsctl verify bmc-health Usage Guide
Run and inspect deep BMC pre-requisite health checks across a topology.
The bmc-health command also has the alias bmc:
dpsctl verify bmc start --topology my-topologyOverview
dpsctl verify bmc-health performs a deep Redfish probe against the BMCs in a topology. It checks firmware, power, EDPp, telemetry, WPPS, and IB/OOB drift, then screens results against server-configured latency, EDPp, and WPPS thresholds.
For a quick surface-level reachability check, use dpsctl check connection instead.
The BMC health probe can take many minutes on large clusters, especially when deep telemetry and write-cycle checks are enabled. It is asynchronous:
startsubmits the check and returns atask_id.status <task-id>shows live progress.report <task-id>fetches the final cluster-level result.
Only one BMC health check may run at a time on the server. This gate is server-wide, not per topology or per node set. If a second start is attempted while another run is active, it is rejected with the in-flight task_id so you can re-attach with status or report.
Usage
dpsctl verify bmc-health start --topology <topology-name> [options]
dpsctl verify bmc-health status <task-id>
dpsctl verify bmc-health report <task-id> [options]Start
Submit a BMC health check and print the task handle.
dpsctl verify bmc-health start --topology <topology-name>Flags
Includes global dpsctl options.
--topology value (required) topology name to probe
--nodes value [ --nodes value ] (optional) comma separated names of nodes to probe; defaults to all nodes in topology
--samples-per-telemetry value Number of telemetry reads per node (default: 0; 0 uses server default)
--telemetry-interval value Sleep between consecutive telemetry reads on each node (default: 500ms; server-side cap: 60s)
--write-resolution-timeout value Per-PATCH readback window for the write-cycle probe (default: 1s; server-side cap: 60s)
--concurrency value Per-cluster fan-out concurrency (default: 0; 0 uses server default)
--skip-writes Skip the power-limit write/read-back/MaxP/restore probe (read-only mode)
--force-writes Force the power-limit write/read-back/MaxP/restore probe even if the server default is to skip it
--expected-edpp-pct value Expected steady-state EDPp current value, in percent (default: 0; values <=0 are unset; positive values are validated server-side)
--wait Poll status until the task completes, then print the final report
--poll-interval value Status polling interval for --wait (default: 5s; ignored without --wait)
--summary-only With --wait, strip per-node detail from the printed report and keep only Status, ClusterSummary, and Issues (ignored without --wait)
--help, -h show helpStart Behavior
Without --wait, start returns immediately with a task handle containing the task_id, task name, initial status, and start time.
With --wait, start blocks after submission. It polls status, writes one-line progress updates to stderr, and prints the final report to stdout when the task finishes. Progress updates use the format:
bmc-health <task-id>: <status> (<nodes-completed>/<nodes-total> nodes)If the final report contains a failed status.ok, the command exits with status code 1.
On start, --poll-interval and --summary-only are only used by the --wait shortcut. Without --wait, they are silently ignored because the command returns only the task handle.
--skip-writes and --force-writes are mutually exclusive. If neither flag is provided, the request leaves write behavior to the server default. If --skip-writes is provided, the power-limit write/read-back/MaxP/restore probe is disabled. If --force-writes is provided, the write-cycle probe is explicitly enabled.
For numeric tuning flags where 0 is documented as using the server default, 0 leaves that value unset in the request. --expected-edpp-pct is sent only when positive; positive values are validated server-side and must be between 1 and 200. --telemetry-interval and --write-resolution-timeout are capped server-side at 60s.
Status
Show live status and progress for a previously started check.
dpsctl verify bmc-health status <task-id>Flags
Includes global dpsctl options.
--help, -h show helpStatus Output
status prints the task ID, task name, task-manager status string, timestamps, diagnostic message, progress counters, completion flag, and terminal error message when present.
The status string can be Init, Started, Done, or Failed. The progress counters report how many nodes have completed out of the total in-scope nodes.
Server-side status snapshots are coalesced for about 500ms, so very fast polling does not provide fresher task state.
Report
Fetch the final report for a BMC health check.
dpsctl verify bmc-health report <task-id>Flags
Includes global dpsctl options.
--wait Block until the task finishes, then print the report
--poll-interval value Status polling interval for --wait (default: 5s; ignored without --wait)
--summary-only Strip per-node NodeReport detail server-side and return only Status, ClusterSummary, and Issues
--help, -h show helpReport Behavior
Without --wait, report performs a single lookup. If the task is still running, it prints a structured response with done=false and no report payload. Run status and retry later, or use report --wait.
With --wait, report polls status until the task reaches a terminal state, then fetches and prints the final report. Progress updates are written to stderr so stdout remains clean for --output json or --output yaml.
When --summary-only is set, the server omits per-node NodeReport entries from the returned report and keeps only cluster-level Status, ClusterSummary, and Issues. The underlying task result is not modified; until the task expires, a later report call without --summary-only can still retrieve the full report.
If the report response contains an error_message, the CLI returns an error instead of printing an empty report.
If the final report contains a failed status.ok, the command exits with status code 1.
Polling and Re-Attach
The default --poll-interval for --wait is 5s. Non-positive values fall back to 5s. Values below the client-side minimum of 1s are clamped to 1s and the CLI prints a stderr notice explaining the clamp.
Canceling a local --wait command, such as with Ctrl-C, stops only local polling. It does not cancel the server-side BMC health check. Re-attach with:
dpsctl verify bmc-health status <task-id>
dpsctl verify bmc-health report <task-id> --waitCompleted task state remains queryable for the server --zombie-lifetime window, which defaults to 10m. After that window expires, status and report can no longer re-attach to the completed task.
There is no dpsctl verify bmc-health cancel subcommand.
Examples
Start a check and poll later
$ dpsctl verify bmc-health start --topology rack-a
{
"task_id": "task-abc123",
"name": "bmc_health_check",
"status": "Started",
"started_at": "2026-05-21T14:30:00Z"
}Check progress
$ dpsctl verify bmc-health status task-abc123
{
"task_id": "task-abc123",
"name": "bmc_health_check",
"status": "Started",
"started_at": "2026-05-21T14:30:00Z",
"progress": {
"nodes_total": 128,
"nodes_completed": 47
},
"done": false
}Wait for a summary report
dpsctl verify bmc-health report task-abc123 --wait --summary-onlyRun a scoped, read-only check
dpsctl verify bmc-health start --topology rack-a --nodes node001,node002 --skip-writes --waitForce write-cycle checks with slower telemetry sampling
dpsctl verify bmc-health start --topology rack-a --force-writes --telemetry-interval 2s --write-resolution-timeout 5s --wait