bmc-health

`dpsctl verify bmc-health` Usage Guide

Run and inspect deep BMC pre-requisite health checks across a topology.

The bmc-health command also has the alias bmc:

dpsctl verify bmc start --topology my-topology

Overview

dpsctl verify bmc-health performs a deep Redfish probe against the BMCs in a topology. It checks firmware, power, EDPp, telemetry, WPPS, and IB/OOB drift, then screens results against server-configured latency, EDPp, and WPPS thresholds.

For a quick surface-level reachability check, use dpsctl check connection instead.

The BMC health probe can take many minutes on large clusters, especially when deep telemetry and write-cycle checks are enabled. It is asynchronous:

start submits the check and returns a task_id.
status <task-id> shows live progress.
report <task-id> fetches the final cluster-level result.

Only one BMC health check may run at a time on the server. This gate is server-wide, not per topology or per node set. If a second start is attempted while another run is active, it is rejected with the in-flight task_id so you can re-attach with status or report.

Usage

dpsctl verify bmc-health start --topology <topology-name> [options]
dpsctl verify bmc-health status <task-id>
dpsctl verify bmc-health report <task-id> [options]

Start

Submit a BMC health check and print the task handle.

dpsctl verify bmc-health start --topology <topology-name>

Flags

Includes global dpsctl options.

   --topology value                 (required) topology name to probe
   --nodes value [ --nodes value ]  (optional) comma separated names of nodes to probe; defaults to all nodes in topology
   --samples-per-telemetry value    Number of telemetry reads per node (default: 0; 0 uses server default)
   --telemetry-interval value       Sleep between consecutive telemetry reads on each node (default: 500ms; server-side cap: 60s)
   --write-resolution-timeout value Per-PATCH readback window for the write-cycle probe (default: 1s; server-side cap: 60s)
   --concurrency value              Per-cluster fan-out concurrency (default: 0; 0 uses server default)
   --skip-writes                    Skip the power-limit write/read-back/MaxP/restore probe (read-only mode)
   --force-writes                   Force the power-limit write/read-back/MaxP/restore probe even if the server default is to skip it
   --expected-edpp-pct value        Expected steady-state EDPp current value, in percent (default: 0; values <=0 are unset; positive values are validated server-side)
   --wait                           Poll status until the task completes, then print the final report
   --poll-interval value            Status polling interval for --wait (default: 5s; ignored without --wait)
   --summary-only                   With --wait, strip per-node detail from the printed report and keep only Status, ClusterSummary, and Issues (ignored without --wait)
   --help, -h                       show help

Start Behavior

Without --wait, start returns immediately with a task handle containing the task_id, task name, initial status, and start time.

With --wait, start blocks after submission. It polls status, writes one-line progress updates to stderr, and prints the final report to stdout when the task finishes. Progress updates use the format:

bmc-health <task-id>: <status> (<nodes-completed>/<nodes-total> nodes)

If the final report contains a failed status.ok, the command exits with status code 1.

On start, --poll-interval and --summary-only are only used by the --wait shortcut. Without --wait, they are silently ignored because the command returns only the task handle.

--skip-writes and --force-writes are mutually exclusive. If neither flag is provided, the request leaves write behavior to the server default. If --skip-writes is provided, the power-limit write/read-back/MaxP/restore probe is disabled. If --force-writes is provided, the write-cycle probe is explicitly enabled.

For numeric tuning flags where 0 is documented as using the server default, 0 leaves that value unset in the request. --expected-edpp-pct is sent only when positive; positive values are validated server-side and must be between 1 and 200. --telemetry-interval and --write-resolution-timeout are capped server-side at 60s.

Status

Show live status and progress for a previously started check.

dpsctl verify bmc-health status <task-id>

Flags

Includes global dpsctl options.

   --help, -h  show help

Status Output

status prints the task ID, task name, task-manager status string, timestamps, diagnostic message, progress counters, completion flag, and terminal error message when present.

The status string can be Init, Started, Done, or Failed. The progress counters report how many nodes have completed out of the total in-scope nodes.

Server-side status snapshots are coalesced for about 500ms, so very fast polling does not provide fresher task state.

Report

Fetch the final report for a BMC health check.

dpsctl verify bmc-health report <task-id>

Flags

Includes global dpsctl options.

   --wait                 Block until the task finishes, then print the report
   --poll-interval value  Status polling interval for --wait (default: 5s; ignored without --wait)
   --summary-only         Strip per-node NodeReport detail server-side and return only Status, ClusterSummary, and Issues
   --help, -h             show help

Report Behavior

Without --wait, report performs a single lookup. If the task is still running, it prints a structured response with done=false and no report payload. Run status and retry later, or use report --wait.

With --wait, report polls status until the task reaches a terminal state, then fetches and prints the final report. Progress updates are written to stderr so stdout remains clean for --output json or --output yaml.

When --summary-only is set, the server omits per-node NodeReport entries from the returned report and keeps only cluster-level Status, ClusterSummary, and Issues. The underlying task result is not modified; until the task expires, a later report call without --summary-only can still retrieve the full report.

If the report response contains an error_message, the CLI returns an error instead of printing an empty report.

If the final report contains a failed status.ok, the command exits with status code 1.

Polling and Re-Attach

The default --poll-interval for --wait is 5s. Non-positive values fall back to 5s. Values below the client-side minimum of 1s are clamped to 1s and the CLI prints a stderr notice explaining the clamp.

Canceling a local --wait command, such as with Ctrl-C, stops only local polling. It does not cancel the server-side BMC health check. Re-attach with:

dpsctl verify bmc-health status <task-id>
dpsctl verify bmc-health report <task-id> --wait

Completed task state remains queryable for the server --zombie-lifetime window, which defaults to 10m. After that window expires, status and report can no longer re-attach to the completed task.

There is no dpsctl verify bmc-health cancel subcommand.

Examples

Start a check and poll later

$ dpsctl verify bmc-health start --topology rack-a
{
  "task_id": "task-abc123",
  "name": "bmc_health_check",
  "status": "Started",
  "started_at": "2026-05-21T14:30:00Z"
}

Check progress

$ dpsctl verify bmc-health status task-abc123
{
  "task_id": "task-abc123",
  "name": "bmc_health_check",
  "status": "Started",
  "started_at": "2026-05-21T14:30:00Z",
  "progress": {
    "nodes_total": 128,
    "nodes_completed": 47
  },
  "done": false
}

Wait for a summary report

dpsctl verify bmc-health report task-abc123 --wait --summary-only

Run a scoped, read-only check

dpsctl verify bmc-health start --topology rack-a --nodes node001,node002 --skip-writes --wait

Force write-cycle checks with slower telemetry sampling

dpsctl verify bmc-health start --topology rack-a --force-writes --telemetry-interval 2s --write-resolution-timeout 5s --wait

bmc-health

dpsctl verify bmc-health Usage Guide

Overview

Usage

Start

Flags

Start Behavior

Status

Flags

Status Output

Report

Flags

Report Behavior

Polling and Re-Attach

Examples

Start a check and poll later

Check progress

Wait for a summary report

Run a scoped, read-only check

Force write-cycle checks with slower telemetry sampling

`dpsctl verify bmc-health` Usage Guide