> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/switch-infrastructure/config-manager/llms.txt.
> For full documentation content, see https://docs.nvidia.com/switch-infrastructure/config-manager/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/switch-infrastructure/config-manager/_mcp/server.

# Switch OS Upgrade

The Switch OS Upgrade workflow upgrades a single Cumulus Linux switch to a new firmware version. It computes the version delta against the device's intended firmware in Nautobot, pauses for a human to approve the upgrade, backs up the current configuration, updates device-context state, drives the image install + reboot + ZTP cycle, and validates that the device came back on the right version. Multiple short-circuit paths exit safely when no work is required.

For NVLink switches, use [NVLink Switch Firmware Upgrade](/switch-infrastructure/config-manager/user-guides/lifecycle/nv-link-firmware-upgrade) instead.

## Prerequisites

Before running, complete the pre-work that drives the upgrade. The workflow does nothing until the firmware target moves.

1. **Update the firmware target on the Config Context with the `location-firmware-targets` schema attached at the location level.**
2. **Confirm the re-render has propagated** so the new target is visible to the diff. Without this the workflow short-circuits with "intended matches running."

Also confirm:

* **Device exists in Nautobot** with a current intended configuration in the [Config Store](/switch-infrastructure/config-manager/services/config-store/overview).
* **Device is on a supported platform** — Cumulus Linux only. Other platforms (NVOS for NVLink, vendor OSes) are rejected at runtime.
* **Firmware image is uploaded** to the ZTP service for the target Cumulus Linux version and the device's platform. See [Upload Images to the ZTP Server](/switch-infrastructure/config-manager/services/network-ztp/upload-images) for the upload procedure.
* **Device is reachable** and credentials are current. The upgrade flow drives image install over the management network and waits for the device to ZTP back into Provisioned state.
* **Approval to take the device offline.** The upgrade triggers a reboot — plan a maintenance window appropriate for the device's role in the fabric.

## Running the workflow

1. Navigate to the Config Manager URL for your environment.
2. Click the **+** in the top right and select **SwitchOSUpgradeWorkflow**.
3. Fill in the form using the field reference below and submit.

| Field      | Description                                                      | Required |
| :--------- | :--------------------------------------------------------------- | :------- |
| **Site**   | The site of the target device. Drives the device list below.     | Yes      |
| **Tenant** | Optional Nautobot tenant filter to narrow the device list.       | No       |
| **Status** | Optional device-status filter to narrow the device list.         | No       |
| **Device** | The target device. The list is filtered by the selections above. | Yes      |

After submission, a status page appears showing the four stages. The workflow blocks at the approval stage until a human acts.

## Execution stages

The workflow runs four stages in order. Only `approve_upgrade` requires approval — and even that gate can short-circuit when no upgrade is needed.

1. **`approve_upgrade` — Compute the delta and request approval.**

   Reads the device's current firmware off the running configuration and compares it against the intended firmware in Nautobot. Three outcomes:

   * **Unsupported platform** — the workflow returns early, the approval stage is cleared, and downstream stages are marked UNREACHABLE.
   * **Intended matches running** — no upgrade needed. The stage flips `requires_approval` to false at runtime and short-circuits; downstream stages are marked UNREACHABLE.
   * **Genuine version delta** — the stage transitions to `PENDING_APPROVAL` and waits indefinitely on `workflow.wait_condition` until a reviewer approves or rejects. Rejection marks `perform_backup`, `update_device_configuration`, and `perform_upgrade` as UNREACHABLE; no change is made.

2. **`perform_backup` — Capture a pre-upgrade backup.**

   Starts the [Configuration Backup](/switch-infrastructure/config-manager/user-guides/configuration-deploy/configuration-backup) workflow as a child workflow with `trigger=WORKFLOW`. After it completes, the workflow runs `check_recorded_config_drift`: if the just-recorded backup differs from what was last known to Config Manager, the upgrade halts with the downstream stages marked UNREACHABLE. Drift means the device has been configured out of band; resolve the drift before re-running.

3. **`update_device_configuration` — Refresh device context for the upgrade.**

   Calls `validate_rendered_image_change` (6-minute start-to-close timeout, 2-minute heartbeat) to ensure the rendered configuration is consistent with the upgrade target, and updates Config Manager's internal state to track the new firmware.

4. **`perform_upgrade` — Install the image and wait for the device to come back.**

   Pushes the image to the device via the ZTP service and reboots. Two long-poll activities then wait for the device to converge:

   * `poll_image` — 35-minute timeout (30 + 5 buffer), 3-minute heartbeat. If the post-reboot image does not match the intended firmware, raises `ApplicationError`.
   * `poll_ztp_status` — 15-minute timeout (10 + 5 buffer). If ZTP does not complete in that window, raises `ApplicationError`.

     Retry policy: 3 attempts, with `FirmwareUpgradeException` non-retryable.

The workflow returns `True` on full success, `False` on early-exit branches (no-upgrade-needed, rejected approval, drift).

## Verifying outcomes

After the workflow reports success, confirm:

* **All four stages green** on the Config Manager run page (or `approve_upgrade` green and downstream UNREACHABLE for the short-circuit paths).
* **`nv show platform`** on the device reports the intended firmware version.
* **Nautobot device status** is back to Provisioned (or Active), and the device responds to a ZTP-completion verification command (`/etc/os-release`, or your environment's equivalent).
* **A pre-upgrade backup** is in the Config Store, tagged with the commit SHA the workflow used.

## Rollback

Cumulus Linux does not have a native one-shot OS rollback. The recovery options are:

* **Set the intended firmware back to the previous version in Nautobot**, re-render via the [Render Service](/switch-infrastructure/config-manager/services/render/overview), and re-run Switch OS Upgrade. The workflow will install the older image and reboot the device back onto it.
* **Reprovision the device** via [Device Reprovision](/switch-infrastructure/config-manager/user-guides/lifecycle/device-reprovision). The reprovision flow factory-resets and re-runs ZTP with whatever intended firmware is currently set in Nautobot — useful when the post-upgrade device state is unrecoverable.
* **Use the pre-upgrade backup** from the Config Store to restore configuration after a successful rollback to the prior firmware.

## Common issues

**Stage is stuck on "Waiting for approval".**

`approve_upgrade` blocks indefinitely. Open the workflow page and approve or reject.

**`perform_backup` reports configuration drift and the workflow halts.**

The running config diverges from what Config Manager last knew about. Investigate (someone applied a change out of band; an automated agent on the device is modifying state). Once drift is reconciled — typically by running [Configuration Backup](/switch-infrastructure/config-manager/user-guides/configuration-deploy/configuration-backup) and accepting the new baseline, or by reverting the out-of-band change — re-run the upgrade.

**`poll_image` times out or reports a version mismatch.**

The device rebooted but did not come up on the intended firmware. This is most often because the wrong image was uploaded to the ZTP service for the target version, or because the device has hardware-specific image quirks. Confirm the right image is on the ZTP service and that it matches the platform; if needed, use [Device Reprovision](/switch-infrastructure/config-manager/user-guides/lifecycle/device-reprovision) to recover.

**`poll_ztp_status` times out.**

The image install succeeded but ZTP did not complete within 10 minutes after reboot. Use the [Monitoring DHCP and ZTP](/switch-infrastructure/config-manager/user-guides/new-site-bringup#monitoring-dhcp-and-ztp) section of New Site Bringup to investigate.

**`FirmwareUpgradeException` is raised.**

A non-retryable failure in the upgrade activity. Read the error message — the activity surfaces it directly. Common causes: image download failure, image checksum mismatch, install-time error reported by the device. Resolve and re-run.

## Related guides

* [Controlling Running Workflows](/switch-infrastructure/config-manager/user-guides/controlling-running-workflows) — approve, reject, retry, and terminate behavior, including how to recover if the upgrade is terminated mid-install.
* [NVLink Switch Firmware Upgrade](/switch-infrastructure/config-manager/user-guides/lifecycle/nv-link-firmware-upgrade) — the sibling workflow for NVLink switches.
* [Configuration Backup](/switch-infrastructure/config-manager/user-guides/configuration-deploy/configuration-backup) — child workflow used for the pre-upgrade snapshot.
* [Device Reprovision](/switch-infrastructure/config-manager/user-guides/lifecycle/device-reprovision) — recovery path when an upgrade leaves the device unrecoverable.
* [Network ZTP](/switch-infrastructure/config-manager/services/network-ztp/overview) — explains how images are served and how the device gets its post-reboot configuration.
* [New Site Bringup](/switch-infrastructure/config-manager/user-guides/new-site-bringup) — includes the canonical DHCP/ZTP troubleshooting paths.