For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
GitHub
DocumentationREST API Reference
DocumentationREST API Reference
    • Home
  • Overview
    • What is NICo?
    • Key Capabilities
    • Operational Principles
    • Day 0 / Day 1 / Day 2 Lifecycle
    • Scope and Boundaries
  • Getting Started
    • Building NICo Containers
    • Quick Start Guide
  • Provisioning (Day 0 Operations)
    • Ingesting Hosts
    • Ingesting Hosts (REST API)
    • Machine Validation
    • SKU Validation
    • Measured Boot Attestation
  • DPU Management
    • DPU Lifecycle Management
    • DPU Configuration
    • BlueField DPU Operations
  • Configuration (Day 1 Operations)
    • Network Isolation
    • Tenant Management
    • Organization & Permissions
  • Architecture
    • Overview and Components
    • Redfish Workflow
    • Redfish Endpoints Reference
    • Reliable State Handling
    • Networking Integrations
    • Health Checks and Health Aggregation
    • Health Probe IDs
    • Health Alert Classifications
    • Key Group Synchronization
      • Managed Host
      • Switch
      • Power Shelf
      • Rack State Machine
  • Operations
    • Network Isolation
    • Network Security Groups
    • InfiniBand Partitioning
    • nicocli Reference
    • NVLink Partitioning
    • Rack-Level Administration (RLA)
    • IP Resource Pools
    • BGP Peering
  • Playbooks
    • Azure OIDC for Infra Controller Web UI
    • Force Deleting and Rebuilding Hosts
    • Rebooting a Machine
    • InfiniBand Setup
  • Development
    • Codebase Overview
    • Bootable Artifacts
    • Local Development
    • Running a PXE Client in a VM
    • TLS and SPIFFE Certificates
    • SPIFFE and casbin policies with admin-cli
    • Re-creating Issuer/CA in Local Dev
    • Visual Studio Code Remote Development
    • Adding Support for New Hardware
    • Build Guide
  • Reference
    • Hardware Compatibility List
    • Release Notes
    • FAQs
    • Glossary
GitHub
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogo
On this page
  • High-Level Overview
  • States
  • Transitions (by trigger)
  • OnDemand Maintenance Operations
  • Operator interface (admin CLI)
  • gRPC interface
  • Initiator string
  • Data model
  • Notes / Status
ArchitectureState Machines

Power Shelf State Diagram

||View as Markdown|
Previous

Switch

Next

Rack State Machine

This document describes the Finite State Machine (FSM) for Power Shelves in NICo: lifecycle from creation through initialization, ready, the OnDemand maintenance operations (PowerOn / PowerOff), and deletion. It also covers the operator-facing interfaces (CLI / gRPC) used to drive the OnDemand operations.

High-Level Overview

The main flow shows the primary states and transitions:

States

StateDescription
InitializingPower shelf record exists in NICo; awaiting first controller tick.
FetchingDataController is fetching power shelf data from the BMC / PMC.
ConfiguringPower shelf is being configured (credentials, monitoring, etc.).
ReadyPower shelf is ready. From here it can be deleted or driven into Maintenance by an OnDemand request.
MaintenancePower shelf is executing an operator-requested maintenance operation. Sub-states (carried in the state variant as operation): PowerOn, PowerOff.
ErrorPower shelf is in error (e.g. maintenance operation failed). Can transition to Deleting if marked for deletion, or to Maintenance if an OnDemand maintenance request is posted; otherwise waits for manual intervention.
DeletingPower shelf is being removed; ends in final delete (terminal).

Transitions (by trigger)

FromToTrigger / Condition
(create)InitializingPower shelf created
InitializingFetchingDataController processes power shelf
FetchingDataConfiguringData fetch complete
ConfiguringReadyConfiguration complete
ReadyDeletingdeleted set (marked for deletion)
ReadyMaintenance { PowerOn }power_shelf_maintenance_requested.operation == PowerOn
ReadyMaintenance { PowerOff }power_shelf_maintenance_requested.operation == PowerOff
Maintenance { PowerOn | PowerOff }ReadyBMC operation complete; controller clears power_shelf_maintenance_requested
Maintenance { PowerOn | PowerOff }ErrorBMC operation failed
ErrorDeletingdeleted set (marked for deletion)
ErrorMaintenance { PowerOn }power_shelf_maintenance_requested.operation == PowerOn
ErrorMaintenance { PowerOff }power_shelf_maintenance_requested.operation == PowerOff
Deleting(end)Final delete committed

OnDemand Maintenance Operations

The Maintenance state is entered on demand by an operator (or any other service) to drive a power-control operation on the shelf. Two operations are currently supported:

  • PowerOn — powers the shelf on via the BMC / PMC.
  • PowerOff — powers the shelf off via the BMC / PMC.

Operator interface (admin CLI)

The power-shelf maintenance subcommand (alias fix) drives one or more shelves into Maintenance. All listed shelves receive the same operation in a single atomic request — if any shelf is unknown or marked for deletion the entire request is rejected and no shelves are mutated.

$# Single shelf
$nico-admin-cli power-shelf maintenance power-off \
> --power-shelf-id <ID>
$
$# Multiple shelves, repeated flag
$nico-admin-cli power-shelf maintenance power-on \
> --power-shelf-id <ID1> \
> --power-shelf-id <ID2> \
> --reference https://tracker/MAINT-123
$
$# Multiple shelves, single flag with several values
$nico-admin-cli power-shelf maintenance power-off \
> --power-shelf-id <ID1> <ID2> <ID3>
$
$# Using the visible aliases
$nico-admin-cli power-shelf fix power-on --id <ID1> <ID2>
FlagRequiredDescription
--power-shelf-id (alias --id)yesOne or more Power Shelf IDs. Repeat the flag or pass multiple values space-separated.
--reference (alias --ref)noURL of a ticket / issue tracking this maintenance request. Recorded as part of the initiator string.

gRPC interface

The CLI is a thin wrapper over the NICo.SetPowerShelfMaintenance gRPC method:

1rpc SetPowerShelfMaintenance(PowerShelfMaintenanceRequest) returns (google.protobuf.Empty);
2
3enum PowerShelfMaintenanceOperation {
4 POWER_SHELF_MAINTENANCE_OPERATION_UNSPECIFIED = 0;
5 POWER_SHELF_MAINTENANCE_OPERATION_POWER_ON = 1;
6 POWER_SHELF_MAINTENANCE_OPERATION_POWER_OFF = 2;
7}
8
9message PowerShelfMaintenanceRequest {
10 repeated common.PowerShelfId power_shelf_ids = 1;
11 PowerShelfMaintenanceOperation operation = 2;
12 optional string reference = 3;
13}

Validation rules enforced by the API handler:

  • power_shelf_ids must contain at least one ID and at most runtime_config.max_find_by_ids IDs.
  • operation must be POWER_ON or POWER_OFF; UNSPECIFIED is rejected with InvalidArgument.
  • Every listed power shelf must exist and must not be marked for deletion; otherwise the entire transaction is rolled back.

The handler runs all updates in a single database transaction, so a caller observes the new maintenance request on every listed shelf simultaneously (or not at all).

Initiator string

The DB column power_shelf_maintenance_requested stores who/why requested the operation. The API composes it from two sources:

Auth user--referenceResulting initiator
presentpresent"<user> (<reference>)"
presentabsent"<user>"
absentpresent"<reference>"
absentabsent"admin-cli"

Data model

The power_shelves.power_shelf_maintenance_requested column is a nullable JSONB; the API sets it on SetPowerShelfMaintenance and the state controller clears it when the operation finishes

Notes / Status

  • The state machine and request plumbing (CLI → gRPC → DB → state controller → state transition) are implemented end-to-end.
  • The actual BMC / PMC call inside the Maintenance handler is still a TODO: today the handler clears the maintenance request and returns to Ready without driving the hardware. Plugging in the real PowerOn / PowerOff BMC action is a focused follow-up that does not change the public CLI / gRPC surface.