Front Fan Module Replacement#

Front Fan Module Replacement Overview#

This is a high-level overview of the procedure to replace the front fan modules on the NVIDIA DGX™ B200 system.

  1. Identify the failed front fan module through BMC or with the fan module LED and submit a service ticket.

  2. Get a replacement from NVIDIA Enterprise Support.

  3. Remove the failed fan module.

  4. Insert new fan module.

  5. Confirm that the new fan module works correctly through the BMC or the operating system tools.

  6. Return the failed unit to NVIDIA Enterprise Support using the packaging provided.

Identifying a Failed Fan Module#

You can identify a failed fan module using one of the following methods:

  • Remove the system bezel and visually inspect the fan module LEDs.

  • Run the nvsm show fans command and view the command output.

  • Access the BMC Web User Interface and view the sensor data from the fans. If a fan runs at an abnormal speed, that fan needs to be replaced.

Viewing the Fan Module LEDs

  1. Expose the fan modules following the instructions in Removing and Attaching the Bezel.

    After you remove the bezel, the system looks like the following figure.

    _images/b200-front-view-fans.png
  2. Identify the failed fan using the fan module fault LED, as shown in the following figure.

    _images/fan-module.png
  3. Look for the fault LED lit in the upper right corner of the faulty fan module, as shown in the following figure.

    _images/fan-led.png

Running the nvsm command

  • From the operating system, run:

    sudo nvsm show fans
    

    View the command output for alerts, failures, or an unhealthy status.

Viewing Fan Modules from the BMC Web User Interface

  1. Identify the faulty fan module using the BMC dashboard.

  2. Log on to the BMC.

  3. Select Sensor from the left-side navigation menu.

  4. Review the Normal Sensors section.

  5. Look for abnormal fan speeds in the right column.

    _images/b200-fan-speed.png

    The fan module has two fans, identified by SPD_FAN_SYSn_F and SPD_FAN_SYSn_R, where n is the module ID. If either fan fails, the entire module must be replaced.

  6. Use the nvsm command to confirm the fan issue.

    sudo nvsm show fans
    

    View the output and confirm that the status is unhealthy for the same fan.

Replacing and Returning the Front Fan Module#

  1. Remove the new fan module from its packaging and be ready to install it.

  2. Expose the fan modules following the instructions in Removing and Attaching the Bezel.

  3. To remove the failed fan module, unlock the fan module by pressing the release button and then pull the module out of the chassis.

    _images/b200-fan-release-button.png
  4. Replace the failed fan module with the new one.

    Important

    Replace the old fan with the new one within 30 seconds to prevent overheating.

    _images/b200-fan-replace.png
  5. Confirm that the fan module is healthy and working correctly by performing the following tasks:

    • Use the BMC Web User Interface.

    • Verify that the amber LED on the fan module is extinguished.

    • Run the sudo nvsm show fans command.

    • Install the bezel as described in Removing and Attaching the Bezel.

Return the failed fan module to NVIDIA Enterprise Support using the packaging from the new fan module.