Front Fan Module Replacement

Front Fan Module Replacement Overview

This is a high-level overview of the steps needed to replace the front fan modules.

  1. Identify failed front fan module through BMC or with the fan module LED and submit a service ticket

  2. Get replacement from NVIDIA Enterprise Support

  3. Remove failed fan module

  4. Insert new fan module

  5. Confirm new fan module is working correctly through BMC or the operating system tools

  6. Return/ship the failed unit to NVIDIA Enterprise Support using the packaging provided

Identifying a Failed Fan Module

You can identify a failed fan module using any of the following methods:

  • Remove the system bezel and visually inspect the fan module LEDs.

  • Run the nvsm show fans command and view the command output.

  • Access the BMC web user interface and view the sensor data from the fans. If a fan is running at an abnormal speed, then that fan needs to be replaced.

Viewing the Fan Module LEDs

  1. Removing and Attaching the Bezel to expose the fan modules.

    After you remove the bezel, the system looks like the following figure.

    _images/dgx-h100-front-view-fans.png
  2. Identify the failed fan using the fan module fault LED as shown in the following figure.

    _images/fan-module.png
  3. Look for the lit fault LED on the upper right corner of the faulty fan module as shown in the following figure.

    _images/fan-led.png

Running the Show Fans command

  • From the operating system, run:

    sudo nvsm show fans
    

    View the command output for any alerts, failures, or an unhealthy status.

Viewing Fan Modules from the BMC web user interface

  1. Identify the faulty fan module using the BMC dashboard.

  2. Log on to the BMC.

  3. Click Sensor from the left navigation menu.

  4. Review the Normal Sensors section.

  5. Look for abnormal fan speeds in the right column.

    _images/front-fan-module.png

    There are two fans in the fan module, identified by SPD_FAN_SYSn_F and SPD_FAN_SYSn_R, where n is the module ID. If either fan fails, then the entire module must be replaced.

  6. Use the nvsm command to confirm the fan issue.

    sudo nvsm show fans
    

    View the output and confirm that the status is unhealthy for the same fan.

Replacing and Returning the Front Fan Module

  1. Remove the new fan module from its packaging and be ready to install it.

    Important

    Replace the old fan with the new one within 30 seconds to avoid overheating of the system components.

  2. Refer to Removing and Attaching the Bezel to expose the fan modules.

  3. Unlock the fan module by pressing the release button, as shown in the following figure.

    _images/dgx-h100-fan-release-button.png
  4. Replace the failed fan module with the new one.

    _images/dgx-h100-fan-replace.png
  5. Confirm that the fan module is healthy working properly by performing the following actions:

    • Using the BMC web user interface

    • Verifying that the amber LED on the fan module is extinguished

    • Running the sudo nvsm show fans command

    • Install the bezel as described in the bezel section