For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
    • NVIDIA Switch Infrastructure
    • I want to...
  • Quick Start
    • Start Here
    • Getting Started with Config Manager
    • TUI Wizard Reference
    • Configuration Samples
    • Interfaces
    • Local Development Quick Start
    • First Run Tour
  • Config Manager Overview
    • Config Manager Concepts
    • Getting Started with Nautobot
  • User Guides
    • New Site Bringup
    • Workflow Lifecycle
      • Switch OS Upgrade
      • NVLink Firmware Upgrade
      • Reprovision
      • Device Password Rotation
      • Site Password Rotation
  • Deployment
    • Hosting Options
    • Network Topology Requirements
    • Firewall Ports
    • Airgapped Deployment
    • Troubleshooting
  • Services
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogo
On this page
  • Prerequisites
  • Running the workflow
  • Execution stages
  • Verifying outcomes
  • Rollback
  • Common issues
  • Related guides
User GuidesLifecycle

Switch OS Upgrade

||View as Markdown|
Previous

Connected Host Metadata

Next

NVLink Switch Firmware Upgrade

The Switch OS Upgrade workflow upgrades a single Cumulus Linux switch to a new firmware version. It computes the version delta against the device’s intended firmware in Nautobot, pauses for a human to approve the upgrade, backs up the current configuration, updates device-context state, drives the image install + reboot + ZTP cycle, and validates that the device came back on the right version. Multiple short-circuit paths exit safely when no work is required.

For NVLink switches, use NVLink Switch Firmware Upgrade instead.

Prerequisites

Before running, complete the pre-work that drives the upgrade. The workflow does nothing until the firmware target moves.

  1. Update the firmware target on the Config Context with the location-firmware-targets schema attached at the location level.
  2. Confirm the re-render has propagated so the new target is visible to the diff. Without this the workflow short-circuits with “intended matches running.”

Also confirm:

  • Device exists in Nautobot with a current intended configuration in the Config Store.
  • Device is on a supported platform — Cumulus Linux only. Other platforms (NVOS for NVLink, vendor OSes) are rejected at runtime.
  • Firmware image is uploaded to the ZTP service for the target Cumulus Linux version and the device’s platform. See Upload Images to the ZTP Server for the upload procedure.
  • Device is reachable and credentials are current. The upgrade flow drives image install over the management network and waits for the device to ZTP back into Provisioned state.
  • Approval to take the device offline. The upgrade triggers a reboot — plan a maintenance window appropriate for the device’s role in the fabric.

Running the workflow

  1. Navigate to the Config Manager URL for your environment.
  2. Click the + in the top right and select SwitchOSUpgradeWorkflow.
  3. Fill in the form using the field reference below and submit.
FieldDescriptionRequired
SiteThe site of the target device. Drives the device list below.Yes
TenantOptional Nautobot tenant filter to narrow the device list.No
StatusOptional device-status filter to narrow the device list.No
DeviceThe target device. The list is filtered by the selections above.Yes

After submission, a status page appears showing the four stages. The workflow blocks at the approval stage until a human acts.

Execution stages

The workflow runs four stages in order. Only approve_upgrade requires approval — and even that gate can short-circuit when no upgrade is needed.

  1. approve_upgrade — Compute the delta and request approval.

    Reads the device’s current firmware off the running configuration and compares it against the intended firmware in Nautobot. Three outcomes:

    • Unsupported platform — the workflow returns early, the approval stage is cleared, and downstream stages are marked UNREACHABLE.
    • Intended matches running — no upgrade needed. The stage flips requires_approval to false at runtime and short-circuits; downstream stages are marked UNREACHABLE.
    • Genuine version delta — the stage transitions to PENDING_APPROVAL and waits indefinitely on workflow.wait_condition until a reviewer approves or rejects. Rejection marks perform_backup, update_device_configuration, and perform_upgrade as UNREACHABLE; no change is made.
  2. perform_backup — Capture a pre-upgrade backup.

    Starts the Configuration Backup workflow as a child workflow with trigger=WORKFLOW. After it completes, the workflow runs check_recorded_config_drift: if the just-recorded backup differs from what was last known to Config Manager, the upgrade halts with the downstream stages marked UNREACHABLE. Drift means the device has been configured out of band; resolve the drift before re-running.

  3. update_device_configuration — Refresh device context for the upgrade.

    Calls validate_rendered_image_change (6-minute start-to-close timeout, 2-minute heartbeat) to ensure the rendered configuration is consistent with the upgrade target, and updates Config Manager’s internal state to track the new firmware.

  4. perform_upgrade — Install the image and wait for the device to come back.

    Pushes the image to the device via the ZTP service and reboots. Two long-poll activities then wait for the device to converge:

    • poll_image — 35-minute timeout (30 + 5 buffer), 3-minute heartbeat. If the post-reboot image does not match the intended firmware, raises ApplicationError.

    • poll_ztp_status — 15-minute timeout (10 + 5 buffer). If ZTP does not complete in that window, raises ApplicationError.

      Retry policy: 3 attempts, with FirmwareUpgradeException non-retryable.

The workflow returns True on full success, False on early-exit branches (no-upgrade-needed, rejected approval, drift).

Verifying outcomes

After the workflow reports success, confirm:

  • All four stages green on the Config Manager run page (or approve_upgrade green and downstream UNREACHABLE for the short-circuit paths).
  • nv show platform on the device reports the intended firmware version.
  • Nautobot device status is back to Provisioned (or Active), and the device responds to a ZTP-completion verification command (/etc/os-release, or your environment’s equivalent).
  • A pre-upgrade backup is in the Config Store, tagged with the commit SHA the workflow used.

Rollback

Cumulus Linux does not have a native one-shot OS rollback. The recovery options are:

  • Set the intended firmware back to the previous version in Nautobot, re-render via the Render Service, and re-run Switch OS Upgrade. The workflow will install the older image and reboot the device back onto it.
  • Reprovision the device via Device Reprovision. The reprovision flow factory-resets and re-runs ZTP with whatever intended firmware is currently set in Nautobot — useful when the post-upgrade device state is unrecoverable.
  • Use the pre-upgrade backup from the Config Store to restore configuration after a successful rollback to the prior firmware.

Common issues

Stage is stuck on “Waiting for approval”.

approve_upgrade blocks indefinitely. Open the workflow page and approve or reject.

perform_backup reports configuration drift and the workflow halts.

The running config diverges from what Config Manager last knew about. Investigate (someone applied a change out of band; an automated agent on the device is modifying state). Once drift is reconciled — typically by running Configuration Backup and accepting the new baseline, or by reverting the out-of-band change — re-run the upgrade.

poll_image times out or reports a version mismatch.

The device rebooted but did not come up on the intended firmware. This is most often because the wrong image was uploaded to the ZTP service for the target version, or because the device has hardware-specific image quirks. Confirm the right image is on the ZTP service and that it matches the platform; if needed, use Device Reprovision to recover.

poll_ztp_status times out.

The image install succeeded but ZTP did not complete within 10 minutes after reboot. Use the Monitoring DHCP and ZTP section of New Site Bringup to investigate.

FirmwareUpgradeException is raised.

A non-retryable failure in the upgrade activity. Read the error message — the activity surfaces it directly. Common causes: image download failure, image checksum mismatch, install-time error reported by the device. Resolve and re-run.

Related guides

  • Controlling Running Workflows — approve, reject, retry, and terminate behavior, including how to recover if the upgrade is terminated mid-install.
  • NVLink Switch Firmware Upgrade — the sibling workflow for NVLink switches.
  • Configuration Backup — child workflow used for the pre-upgrade snapshot.
  • Device Reprovision — recovery path when an upgrade leaves the device unrecoverable.
  • Network ZTP — explains how images are served and how the device gets its post-reboot configuration.
  • New Site Bringup — includes the canonical DHCP/ZTP troubleshooting paths.