Upgrading System Firmware
NVOS software package version has a default switch firmware version. When updating the operating system software to a new version, an automatic firmware update process will be attempted by NVOS. This process is described below.
To perform an automatic firmware update by the OS for a different switch firmware version without changing the OS version, import and install the firmware package as described below. The OS sets it as the new default firmware and performs the firmware update automatically.
Default Firmware Change
Display the firmware that is currently available.
admin
@nvos
:~$ nv show platform firmware ASIC operational applied --------------- ------------------------- ------- part-number920
-9K36F-00MV-JS0_IPN_Ax actual-firmware35.2014
.1482
auto-update enabled enabled fw-sourcedefault
default
Import the firmware image (.mfa file) to the switch.
admin
@nvos
:~$ nv action fetch platform firmware ASIC /path/to/fw-image.mfaAlternatively, you can upload the FW file from the host to the switch:
NoteThe firmware file must be copied to the predefined directory: "/host/fw-images/asic"
user
@host
:~$ scp <path-to-fw-file> <switch
-admin-username>@<switch
-ip-address>:/host/fw-images/asic/<desired-name>.mfaConfigure default firmware source from user.
admin
@nvos
:~$ nv set platform firmware ASIC fw-source customDisplay system firmware component information.
admin
@nvos
:~$ nv show platform firmware ASIC operational applied --------------- ------------------------- ------- part-number920
-9K36F-00MV-JS0_IPN_Ax actual-firmware35.2014
.1482
auto-update enabled enabled fw-sourcedefault
default
Apply the configuration
admin
@nvos
:~$ nv config applySave the configuration.
admin
@nvos
:~$ nv config saveInstall the firmware image.
admin
@nvos
:~$ nv action install platform firmware ASIC files fw-image.mfa The operation will initiate a component firmware update. Type [y] to install the firmware and reboot afterwards. Type [N] to install the firmware without reboot. Do you want to reboot? [y/N]Press 'Y' will cause system reboot to install and activate firmware.
Press 'N' requires manual reboot from user later on.
admin
@nvos
:~$ nv action reboot system
This section provides step-by-step instructions to manually check and update software and firmware on GB200-NVL NVLink switch tray to ensure the system is up to date with the latest software and firmware versions.
This guide only applies to upgrades on system of the same type (e.g., QS to QS)
It is highly recommended to use nvfwupd Tool to conduct an automatic update GB200-NVL (see NVIDIA Firmware Update Tool: NVOnline Document ID 1107320)
For CPLD only: Unpack the bundles using fwpkg-unpack CLI (see Firmware Package Unpacking Tool: NVOnline Document ID 1090243)
References
Document Title | NVOnline Document ID |
GB200 NVL72 Rack System Specification and Integration Guide | 1117886 |
GB200 NVL Technical Overview | 1114121 |
NVLink Switch Tray | |
NVIDIA NVLink GB200 Switch Systems User Manual | 1115337 |
NVIDIA NVOS User Manual | 1114436 |
Tools | |
NVIDIA Firmware Update Tool | 1107320 |
Firmware Package Unpacking Tool | 1090243 |
Bundles content and upgrade sequence
Bundles List
Bundle | Content | File type | Estimated file size | Estimated update time |
BMC, EROT and ERoT | BMC | .fwpkg | 65 MB | 12 minutes |
FPGA | .fwpkg | 11 MB | 2 minutes | |
ERoT | .fwpkg | 225 KB | 15 seconds | |
BIOS and ERoT | BIOS | .fwpkg | 17 MB | 4 minutes |
ERoT | .fwpkg | 225 KB | 15 seconds | |
CPLD | CPLD | .vme / .bin | 1.6 MB | 10 minutes |
NVOS (not part of the bundle) | .bin | 2.3GB | 10 minutes |
NVOS is not part of the bundle. NVSwitch5 firmware is part of NVOS.
Upgrade Sequence
BMC
FPGA
ERoT
CPLD
BIOS
NVOS
These updates are not done every release. See NVIDIA NVOS Release Notes to see which versions should be used.
A power cycle is needed at the end of the upgrade process.
The upgrade process will require maintenance window.
If necessary, retrieve logs for customer support using the command "nv action generate system tech-support".
Component Image Update Using NVOS CLI
Firmware updates can be done by NVOS CLI commands. CLI commands are blocking, meaning each command must be finished before another one can be.
There are two stages to upgrade each component:
Fetching a file from the unpacked bundle.
admin
@nvos
:~$ nv action fetch platform firmware <component-id> <remote-url>For details, see nv action fetch platform firmware.
Installing a component:
admin
@nvos
:~$ nv action install platform firmware <component-id> files <file-name>For details, see nv action install platform firmware files.
To save time, it it recommended to update one-by-one component and then to choose a power cycle.
<component-id> can be one of the following: ASIC, BMC, BIOS, CPLD1, ERoT and FPGA.
Once upgrading a specific CPLD, all other CPLDs will be upgraded as well.
Power cycle should be triggered if it was chosen after install command but if manual power cycle required, run the following:
admin
@nvos
:~$ nv action power-cycle systemTo verify firmware versions after power cycle, run the following:
For details, see nv show platform firmware.
Transceiver Firmware Upgrade
Firmware updates can be done by NVOS CLI commands. CLI commands are blocking, meaning each command must be finished before another one can be.
There are two stages to upgrade each component:
Fetching a file from the unpacked bundle.
admin
@nvos
:~$ nv action fetch platform firmware transceiver <file-path>For details, see nv action fetch platform firmware.
Installing transceiver firmware.
admin
@nvos
:~$ nv action install platform transceiver <transceiver-id> firmware files <file-name>For details, see nv action install platform transceiver firmware files.
In order to activate the transceiver firmware, NVOS will reset the transceiver as part of the install action.
To verify firmware version, run the following:
admin@nvos
:~$ nv show platform transceiver <transceiver-id> firmware
For details, see nv show platform transceiver firmware.
Component Image Update Using RestAPI
RestAPI can be used from remote server to perform operations on the switch.
RestAPI is not blocking, meaning command can be sent before the previous finished. To deal with this nature, each command returns Task ID, use the Task ID to query for the result between the commands. State of “action_success” means the operation ended successfully.
Upgrades consist of fetch, install, and power cycle at the end of the entire process.
REST API Commands
Query command, should be executed between commands:
admin@nvos
:~$ curl -k --user <nvos-user>:<nvos-password> --request GET 'https://<switch-ip>/nvue_v1/action/<task-id>'
Fetching component image file:
admin
@nvos
:~$ curl -k --user <nvos-user>:<nvos-password> --request POST'https://<switch-ip>/nvue_v1/platform/firmware/<component>'
-H'Content-Type: application/json'
-d'{"@fetch": {"state": "start", "parameters": {"remote-url": "scp://<server-user>:<<server-password> >@<PATH_TO_FILE>"}}}'
Install the component file:
admin
@nvos
:~$ curl -k --user <nvos-user>:<nvos-password> --request POST'https://<switch-ip>/nvue_v1/platform/firmware/<component>/files/</<file-name>'
-H'Content-Type: application/json'
-d'{"@install": {"state": "start", "parameters": {"force": false}}}'
Power cycle:
admin
@nvos
:~$ curl -k --user <nvos-user>:<nvos-password>--request POST'https://<switch-ip>/nvue_v1/system'
-H'Content-Type: application/json'
-d'{"@power-cycle": {"state": "start", "parameters": {"force": true}}}'
After power cycle, check firmware version:
admin
@nvos
:~$ curl -k --user <nvos-user>:<nvos-password> --request GET'https://<nvos-ip>/nvue_v1/platform/firmware'
Error Status Catalog
Use the table below to identify the errors and their meaning.
Bundles List
BMC | |
Scenario | Error |
Selected file for installation doesn't exist | Failed to install BMC firmware file: No such firmware |
Bad or corrupted file | Invalid file: /host/fw-images/bmc/bad_file.fwpkg |
BMC is not accessible | Error: Timed out ... |
Failed to login to BMC | Error: Timed out ... |
Curl returns any other error when sending post request for BMC image installation | Error: X (being X the error returned by Curl) |
During the installation process got responses in invalid format (responses should be in json format) | Error: Invalid JSON format |
BMC returned an error code when triggering installation process (json response for installation command contained 'error' field | Error returned by BMC |
BMC returned not ok task status on installation response | Error: Return status is {status} |
BMC response does not include task status | Error: Missing 'TaskStatus' field |
BMC response task status is not OK during polling for installation | Error: Fail to execute the task - Taskstatus={status} |
Error detected during installation process | Error: {err_msg} |
Installation process was aborted (on BMC side) | Error: The task has been aborted |
EROT (same errors as BMC) | |
Scenario | Error |
Selected file for installation doesn't exist | Failed to install EROT firmware file: No such firmware |
Bad or corrupted file | Invalid file: /host/fw-images/erot/bad_file.fwpkg |
BMC is not accessible | Error: Timed out ... |
Failed to login to BMC | Error: Timed out ... |
Curl returns any other error when sending post request for BMC image installation | Error: X (being X the error returned by Curl) |
Error: Invalid JSON format | During the installation process got responses in invalid format (responses should be in json format) |
BMC returned an error code when triggering installation process (json response for installation command contained 'error' field | Error returned by BMC |
BMC returned not ok task status on installation comand response | Error: Return status is {status} |
BMC response doesn’t include task status | Error: Missing 'TaskStatus' field |
BMC response task status is not OK during polling for installation completition | Error: Fail to execute the task - Taskstatus={status} |
Error detected during installation process | Error: {err_msg} |
Installation process was aborted (on BMC side) | Error: The task has been aborted |
Installation did not finish in 30 minutes | Wait task completion timeout |
FPGA (same errors as BMC) | |
Scenario | Error |
Selected file for installation doesn't exist | Failed to install EROT firmware file: No such firmware |
Bad or corrupted file | Invalid file: /host/fw-images/fpga/bad_file.fwpkg |
BMC is not accessible | Error: Timed out ... |
Failed to login to BMC | Error: Timed out ... |
Curl returns any other error when sending post request for BMC image installation | Error: X (being X the error returned by Curl) |
Error: Invalid JSON format | During the installation process got responses in invalid format (responses should be in json format) |
BMC returned an error code when triggering installation process (json response for installation command contained 'error' field | Error returned by BMC |
BMC returned not ok task status on installation comand response | Error: Return status is {status} |
BMC response doesn’t include task status | Error: Missing 'TaskStatus' field |
BMC response task status is not OK during polling for installation completition | Error: Fail to execute the task - Taskstatus={status} |
Error detected during installation process | Error: {err_msg} |
Installation process was aborted (on BMC side) | Error: The task has been aborted |
Installation didn't finish in 30 minutes | Wait task completion timeout |
BIOS | |
Scenario | Error |
Selected file for installation doesn't exist | Failed to install BIOS firmware file: No such firmware |
Bad or corrupted file | Invalid file: /host/fw-images/bios/bad_file.cab |
Bad Onie version | ERROR: ONIE {} or later is required |
Failed to enable ONIE firmware update mode | ERROR: failed to enable ONIE firmware update mode |
Failed to disable ONIE firmware update mode | ERROR: failed to disable ONIE firmware update mode |
Installation script was interrupted by signal | WARNING: Interrupted by ${_sig}: disable ONIE firmware update mode |
CPLD | |
Scenario | Error |
Selected file for installation doesn't exist | Failed to install CPLD firmware file: No such firmware |
Bad or corrupted file | Invalid file: /host/fw-images/cpld/bad_file.vme |
MST service not started (never started or failed to start) | ERROR: mst driver is not loaded |
MST device path doesn't exist or failed to it | ERROR: Failed to get mst device: pattern={}, devices={} |
CPLD update command failed | ERROR: Failed to update {} firmware: {} |