Power Shelf State Diagram

View as Markdown

This document describes the Finite State Machine (FSM) for Power Shelves in NICo: lifecycle from creation through initialization, ready, the OnDemand maintenance operations (PowerOn / PowerOff), and deletion. It also covers the operator-facing interfaces (CLI / gRPC) used to drive the OnDemand operations.

High-Level Overview

The main flow shows the primary states and transitions:

States

StateDescription
InitializingPower shelf record exists in NICo; awaiting first controller tick.
FetchingDataController is fetching power shelf data from the BMC / PMC.
ConfiguringPower shelf is being configured (credentials, monitoring, etc.).
ReadyPower shelf is ready. From here it can be deleted or driven into Maintenance by an OnDemand request.
MaintenancePower shelf is executing an operator-requested maintenance operation. Sub-states (carried in the state variant as operation): PowerOn, PowerOff.
ErrorPower shelf is in error (e.g. maintenance operation failed). Can transition to Deleting if marked for deletion; otherwise waits for manual intervention.
DeletingPower shelf is being removed; ends in final delete (terminal).

Transitions (by trigger)

FromToTrigger / Condition
(create)InitializingPower shelf created
InitializingFetchingDataController processes power shelf
FetchingDataConfiguringData fetch complete
ConfiguringReadyConfiguration complete
ReadyDeletingdeleted set (marked for deletion)
ReadyMaintenance { PowerOn }power_shelf_maintenance_requested.operation == PowerOn
ReadyMaintenance { PowerOff }power_shelf_maintenance_requested.operation == PowerOff
Maintenance { PowerOn | PowerOff }ReadyBMC operation complete; controller clears power_shelf_maintenance_requested
Maintenance { PowerOn | PowerOff }ErrorBMC operation failed
ErrorDeletingdeleted set (marked for deletion)
Deleting(end)Final delete committed

OnDemand Maintenance Operations

The Maintenance state is entered on demand by an operator (or any other service) to drive a power-control operation on the shelf. Two operations are currently supported:

  • PowerOn — powers the shelf on via the BMC / PMC.
  • PowerOff — powers the shelf off via the BMC / PMC.

Operator interface (admin CLI)

The power-shelf maintenance subcommand (alias fix) drives one or more shelves into Maintenance. All listed shelves receive the same operation in a single atomic request — if any shelf is unknown or marked for deletion the entire request is rejected and no shelves are mutated.

$# Single shelf
$forge-admin-cli power-shelf maintenance power-off \
> --power-shelf-id <ID>
$
$# Multiple shelves, repeated flag
$forge-admin-cli power-shelf maintenance power-on \
> --power-shelf-id <ID1> \
> --power-shelf-id <ID2> \
> --reference https://tracker/MAINT-123
$
$# Multiple shelves, single flag with several values
$forge-admin-cli power-shelf maintenance power-off \
> --power-shelf-id <ID1> <ID2> <ID3>
$
$# Using the visible aliases
$forge-admin-cli power-shelf fix power-on --id <ID1> <ID2>
FlagRequiredDescription
--power-shelf-id (alias --id)yesOne or more Power Shelf IDs. Repeat the flag or pass multiple values space-separated.
--reference (alias --ref)noURL of a ticket / issue tracking this maintenance request. Recorded as part of the initiator string.

gRPC interface

The CLI is a thin wrapper over the Forge.SetPowerShelfMaintenance gRPC method:

1rpc SetPowerShelfMaintenance(PowerShelfMaintenanceRequest) returns (google.protobuf.Empty);
2
3enum PowerShelfMaintenanceOperation {
4 POWER_SHELF_MAINTENANCE_OPERATION_UNSPECIFIED = 0;
5 POWER_SHELF_MAINTENANCE_OPERATION_POWER_ON = 1;
6 POWER_SHELF_MAINTENANCE_OPERATION_POWER_OFF = 2;
7}
8
9message PowerShelfMaintenanceRequest {
10 repeated common.PowerShelfId power_shelf_ids = 1;
11 PowerShelfMaintenanceOperation operation = 2;
12 optional string reference = 3;
13}

Validation rules enforced by the API handler:

  • power_shelf_ids must contain at least one ID and at most runtime_config.max_find_by_ids IDs.
  • operation must be POWER_ON or POWER_OFF; UNSPECIFIED is rejected with InvalidArgument.
  • Every listed power shelf must exist and must not be marked for deletion; otherwise the entire transaction is rolled back.

The handler runs all updates in a single database transaction, so a caller observes the new maintenance request on every listed shelf simultaneously (or not at all).

Initiator string

The DB column power_shelf_maintenance_requested stores who/why requested the operation. The API composes it from two sources:

Auth user--referenceResulting initiator
presentpresent"<user> (<reference>)"
presentabsent"<user>"
absentpresent"<reference>"
absentabsent"admin-cli"

Data model

The power_shelves.power_shelf_maintenance_requested column is a nullable JSONB; the API sets it on SetPowerShelfMaintenance and the state controller clears it when the operation finishes

Notes / Status

  • The state machine and request plumbing (CLI → gRPC → DB → state controller → state transition) are implemented end-to-end.
  • The actual BMC / PMC call inside the Maintenance handler is still a TODO: today the handler clears the maintenance request and returns to Ready without driving the hardware. Plugging in the real PowerOn / PowerOff BMC action is a focused follow-up that does not change the public CLI / gRPC surface.