Troubleshooting
To reset forgotten password of default user accounts, see Reset Local Users' Passwords section.
To rest the BMC root user password, use the nv action reset platform bmc-password command.
If the system encounters issues after an image upgrade, the user can switch back to the old partition.
Check the current partition.
admin
@nvos
:~$ nv show system image operational ---------- ------------------------ current nvos-25.02
.1500
next nvos-25.02
.1500
partition1 nvos-25.02
.1500
partition2 nvos-25.02
.1400
Change to the other partition.
admin
@nvos
:~$ nv action boot-next system image partition2 admin@nvos
:~$ nv show system image operational ---------- ------------------------ current nvos-25.02
.1500
next nvos-25.02
.1400
partition1 nvos-25.02
.1500
partition2 nvos-25.02
.1400
Reboot the system.
admin
@nvos
:~$ nv action reboot system
The system has mechanism to detect if ASIC encountered health/firmware burn issue and try to recover from it.
During the fatal detection and recovery, events will be raised as well. For more information, see ASIC-Related Events in the Event Managment section.
Detecting a Fatal State
The system’s fatal state is indicated in the CLI prompt and in the nv show system health command.
Example:
[System_Fatal_State]admin@nvos
~$ nv show system health
operational applied
---------- ----------- -------
status FATAL
status-led amber
Health issues
================
Component Status information
----------- ---------------------------
ASIC-HEALTH Switch ASIC in fatal state.
Automatic Recovery Mechanism
The system has an internal mechanism to recover from a fatal state without user intervention. The recovery process involves the following steps:
Restart the ASICs of the system.
If restarting the ASICs does not resolve the issue, the system will attempt to recover through a system reboot.
If after the reboot system still encounters ASIC issues, another reboot will be performed.
After the second reboot, the system will start without configuring the ASICs, leaving all ports down. NVOS is running, so logs can be collected for analysis.
To try to revive the switch, perform a power-cycle by the running the command nv action power-cycle system.
If system entered fatal state again, please contact NVIDIA's support team.
Any reboot or power cycle initiated by the user will also reset the system’s fatal detection and recovery mechanism. This process starts the recovery steps from the beginning.
Recovery Timeframe
After the recovery steps are completed and the system remains operational for 10 minutes without any health issues, it will exit the fatal state.
During this 10-minute observation period, the system may still appear in a fatal state as reflected in the CLI prompt and system health command.
Once the system exits the fatal state, the CLI prompt and system health command will confirm the recovery.