This section provides step-by-step instructions to manually check and update software and firmware on GB200-NVL NVLink switch tray to ensure the system is up to date with the latest software and firmware versions.

Note This guide only applies to upgrades on system of the same type (e.g., QS to QS)

It is highly recommended to use nvfwupd Tool to conduct an automatic update GB200-NVL (see NVIDIA Firmware Update Tool: NVOnline Document ID 1107320)

For CPLD only: Unpack the bundles using fwpkg-unpack CLI (see Firmware Package Unpacking Tool: NVOnline Document ID 1090243)

Document Title NVOnline Document ID GB200 NVL72 Rack System Specification and Integration Guide 1117886 GB200 NVL Technical Overview 1114121 NVLink Switch Tray NVIDIA NVLink GB200 Switch Systems User Manual 1115337 NVIDIA NVOS User Manual 1114436 Tools NVIDIA Firmware Update Tool 1107320 Firmware Package Unpacking Tool 1090243

Bundle Content File Type Estimated File Size Estimated Update Time BMC, FPGA, and ERoT BMC .fwpkg 65 MB 12 minutes FPGA .fwpkg 11 MB 2 minutes ERoT .fwpkg 225 KB 15 seconds BIOS and ERoT BIOS .fwpkg 17 MB 4 minutes ERoT .fwpkg 225 KB 15 seconds CPLD CPLD .vme / .bin 1.6 MB 10 minutes NVOS (not part of the bundle) .bin 2.3GB 10 minutes

Note NVOS is not part of the bundle. NVSwitch5 firmware is part of NVOS.

Note These updates are not done every release. See NVIDIA NVOS Release Notes to see which versions should be used. A power cycle is needed at the end of the upgrade process. The upgrade process will require maintenance window. If necessary, retrieve logs for customer support using the command "nv action generate system tech-support".

Firmware updates can be done by NVOS CLI commands. CLI commands are blocking, meaning each command must be finished before another one can be.

There are two stages to upgrade each component:

Fetching a file from the unpacked bundle. Copy Copied! admin @nvos :~$ nv action fetch platform firmware <component-id> <remote-url> For details, see nv action fetch platform firmware. Installing a component: Copy Copied! admin @nvos :~$ nv action install platform firmware <component-id> files <file-name> For details, see nv action install platform firmware files.

To save time, it it recommended to update one-by-one component and then to choose a power cycle.

Note <component-id> can be one of the following: ASIC, BMC, BIOS, CPLD1, ERoT and FPGA. Once upgrading a specific CPLD, all other CPLDs will be upgraded as well.

Power cycle should be triggered if it was chosen after install command but if manual power cycle required, run the following: Copy Copied! admin @nvos :~$ nv action power-cycle system To verify firmware versions after power cycle, run the following: For details, see nv show platform firmware.

Fetching a file from the unpacked bundle. Copy Copied! admin @nvos :~$ nv action fetch platform firmware transceiver <file-path> For details, see nv action fetch platform firmware. Installing transceiver firmware. Copy Copied! admin @nvos :~$ nv action install platform transceiver <transceiver-id> firmware files <file-name> For details, see nv action install platform transceiver firmware files.

In order to activate the transceiver firmware, NVOS will reset the transceiver as part of the install action.

To verify firmware version, run the following:

Copy Copied! admin @nvos :~$ nv show platform transceiver <transceiver-id> firmware

For details, see nv show platform transceiver firmware.

RestAPI can be used from remote server to perform operations on the switch.

RestAPI is not blocking, meaning command can be sent before the previous finished. To deal with this nature, each command returns Task ID, use the Task ID to query for the result between the commands. State of “action_success” means the operation ended successfully.

Upgrades consist of fetch, install, and power cycle at the end of the entire process.

Query command, should be executed between commands:

Copy Copied! admin @nvos :~$ curl -k --user <nvos-user>:<nvos-password> --request GET 'https://<switch-ip>/nvue_v1/action/<task-id>'

Fetching component image file: Copy Copied! admin @nvos :~$ curl -k --user <nvos-user>:<nvos-password> --request POST 'https://<switch-ip>/nvue_v1/platform/firmware/<component>' -H 'Content-Type: application/json' -d '{"@fetch": {"state": "start", "parameters": {"remote-url": "scp://<server-user>:<<server-password> >@<PATH_TO_FILE>"}}}' Install the component file: Copy Copied! admin @nvos :~$ curl -k --user <nvos-user>:<nvos-password> --request POST 'https://<switch-ip>/nvue_v1/platform/firmware/<component>/files/</<file-name>' -H 'Content-Type: application/json' -d '{"@install": {"state": "start", "parameters": {"force": false}}}' Power cycle: Copy Copied! admin @nvos :~$ curl -k --user <nvos-user>:<nvos-password>--request POST 'https://<switch-ip>/nvue_v1/system' -H 'Content-Type: application/json' -d '{"@power-cycle": {"state": "start", "parameters": {"force": true}}}' After power cycle, check firmware version: Copy Copied! admin @nvos :~$ curl -k --user <nvos-user>:<nvos-password> --request GET 'https://<nvos-ip>/nvue_v1/platform/firmware'

Fetch and install BMC. Copy Copied! admin @nvos :~$ nv action fetch platform firmware BMC <remote-url-to-BMC-bundle> admin @nvos :~$ nv action install platform firmware BMC files <fetched-file-name> skip-reboot Fetch and install FPGA. Copy Copied! admin @nvos :~$ nv action fetch platform firmware FPGA <remote-url-to-FPGA-bundle> admin @nvos :~$ nv action install platform firmware FPGA files <fetched-file-name> skip-reboot Fetch and install ERoT. Copy Copied! admin @nvos :~$ nv action fetch platform firmware EROT <remote-url-to-EROT-bundle> admin @nvos :~$ nv action install platform firmware EROT files <fetched-file-name> skip-reboot Fetch and install CPLD. Copy Copied! For CPLD only: Unpack the bundles using fwpkg-unpack CLI (see Firmware Package Unpacking Tool: NVOnline Document ID 1090243 ) Copy Copied! admin @nvos :~$ nv action fetch platform firmware CPLD1 <remote-url-to-unpacked-CPLD-file> admin @nvos :~$ nv action install platform firmware CPLD1 files <fetched-file-name> skip-reboot Fetch and install BIOS: Copy Copied! admin @nvos :~$ nv action fetch platform firmware BIOS <remote-url-to-BIOS-bundle> admin @nvos :~$ nv action install platform firmware BIOS files <fetched-file-name> skip-reboot Fetch and install NVOS, and reboot the system: Copy Copied! admin @nvos :~$ nv action fetch system image files <remote-url-to-NVOS-file> admin @nvos :~$ nv action install system image files <fetched-file-name> reboot no admin @nvos :~$ nv action reboot system Note You can force reboot the system upon NVOS installation by using "force" option Copy Copied! admin @nvos :~$ nv action install system image files <fetched-file-name> force

For automation scripts using NVOS SSH connection, it is recommended to set SSH inactivity timeout to at least 20 minutes prior starting the upgrade process, to enhance automation SSH resiliency Copy Copied! admin @nvos :~$ nv set system ssh-server inactivity-timeout 20 admin @nvos :~$ nv config apply admin @nvos :~$ nv config save

Every NVUE CLI has an equivalent REST API call. Please refer to the documentation for the relevant commands.

The system will reboot twice if "nv action reboot system immediate" or "nv action power-cycle system" commands are used after NVOS installation. For optimized reboot time, use "nv action reboot system."

Use the table below to identify the errors and their meaning.

Bundles List

BMC Scenario Error Selected file for installation doesn't exist Failed to install BMC firmware file: No such firmware Bad or corrupted file Invalid file: /host/fw-images/bmc/bad_file.fwpkg BMC is not accessible Error: Timed out ... Failed to login to BMC Error: Timed out ... Curl returns any other error when sending post request for BMC image installation Error: X (being X the error returned by Curl) During the installation process got responses in invalid format (responses should be in json format) Error: Invalid JSON format BMC returned an error code when triggering installation process (json response for installation command contained 'error' field Error returned by BMC BMC returned not ok task status on installation response Error: Return status is {status} BMC response does not include task status Error: Missing 'TaskStatus' field BMC response task status is not OK during polling for installation Error: Fail to execute the task - Taskstatus={status} Error detected during installation process Error: {err_msg} Installation process was aborted (on BMC side) Error: The task has been aborted EROT (same errors as BMC) Scenario Error Selected file for installation doesn't exist Failed to install EROT firmware file: No such firmware Bad or corrupted file Invalid file: /host/fw-images/erot/bad_file.fwpkg BMC is not accessible Error: Timed out ... Failed to login to BMC Error: Timed out ... Curl returns any other error when sending post request for BMC image installation Error: X (being X the error returned by Curl) Error: Invalid JSON format During the installation process got responses in invalid format (responses should be in json format) BMC returned an error code when triggering installation process (json response for installation command contained 'error' field Error returned by BMC BMC returned not ok task status on installation comand response Error: Return status is {status} BMC response doesn’t include task status Error: Missing 'TaskStatus' field BMC response task status is not OK during polling for installation completition Error: Fail to execute the task - Taskstatus={status} Error detected during installation process Error: {err_msg} Installation process was aborted (on BMC side) Error: The task has been aborted Installation did not finish in 30 minutes Wait task completion timeout FPGA (same errors as BMC) Scenario Error Selected file for installation doesn't exist Failed to install EROT firmware file: No such firmware Bad or corrupted file Invalid file: /host/fw-images/fpga/bad_file.fwpkg BMC is not accessible Error: Timed out ... Failed to login to BMC Error: Timed out ... Curl returns any other error when sending post request for BMC image installation Error: X (being X the error returned by Curl) Error: Invalid JSON format During the installation process got responses in invalid format (responses should be in json format) BMC returned an error code when triggering installation process (json response for installation command contained 'error' field Error returned by BMC BMC returned not ok task status on installation comand response Error: Return status is {status} BMC response doesn’t include task status Error: Missing 'TaskStatus' field BMC response task status is not OK during polling for installation completition Error: Fail to execute the task - Taskstatus={status} Error detected during installation process Error: {err_msg} Installation process was aborted (on BMC side) Error: The task has been aborted Installation didn't finish in 30 minutes Wait task completion timeout