Dual-port ConnectX-5 PCI Card/PCI Riser Replacement

The system comes with a dual-port Mellanox ConnectX-5 card that is configured to work in Ethernet mode. The card is installed in a PCI riser assembly. The following steps outline how to replace either the card alone, or the entire PCI riser assembly.

Dual-port ConnectX-5 Card Replacement Overview

This is a high-level overview of the procedure to replace the dual-port Mellanox ConnectX-5 PCI card or PCI riser assembly on the DGX-2 System.
  1. Use the nvsm show health commands to verify an issue with the dual-port ConnectX-5 card.
  2. Obtain the replacement parts - either the dual-port ConnectX-5 card or the PCI riser assemby - from NVIDIA Enterprise Support.
  3. Shut down the system.
  4. Label all motherboard tray cables and unplug them.
  5. Remove the motherboard tray and place on a solid, flat work surface.
  6. Remove the right-side PCI card riser.
  7. Replace the PCI card if you are only replacing the card itself.
  8. Replace the right-side PCI card riser.
  9. Insert the motherboard tray into the system.
  10. Plug in all cables using the labels as a reference.
  11. Power on the system.
  12. Verify that the ConnectX-5 card is working.

Replacing the Dual-Port ConnectX-5 PCI Card

CAUTION: Static Sensitive Devices: - Be sure to observe best practices for electrostatic discharge (ESD) protection. This includes making sure personnel and equipment are connected to a common ground, such as by wearing a wrist strap connected to the chassis ground, and placing components on static-free work surfaces.

  1. Identify the failed card by running nvsm.
    $ sudo nvsm show health
  2. If the failed component is the Mellanox dual-port card located at PCIe bus 86:00, obtain a replacement part from NVIDIA Enterprise Services.
  3. If replacing the card alone, unpack it upon receipt and confirm that it comes with a low-profile bracket.

Replacement Instructions

  1. Power down the system.
  2. Label all cables connected to the motherboard tray for easy identification when reconnecting.
  3. Unplug the cables.
  4. Remove the motherboard tray.

    Refer to the instructions in the section Removing the Motherboard Tray.

  5. Remove the right PCI card riser.
    1. Release the right PCI card riser by turning the right black screw.

    2. Remove the right PCI riser card from the motherboard tray.

  6. Replace the dual-port PCI card (if applicable).
    1. Loosen and remove the screw that secures the PCI card to the riser.

    2. Pull the old card out of the riser and install the new card into the riser.

    3. Replace and tighten the screw that secures the PCI card to the riser.

  7. Install the right PCI riser.
    1. Replace the right PCI riser card on the motherboard tray.

    2. Tighten the black screw on the right PCI card riser.

  8. Replace the motherboard tray.

    Refer to the instructions in the section Installing the Motherboard Tray.

  9. Connect all the cables to the motherboard tray.
  10. Apply power to the system.
  11. Confirm that the PCI card is visible from the system.
    $ sudo lspci |grep 86\:00
    86:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
    86:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] 
  12. Confirm that the system is healthy.
    $ sudo nvsm show health
  13. Verify basic connectivity to the network.

    Verify mount points are available (if mounted over the ConnectX-5 card).

    Consult the DGX-2 User Guide for instructions on reconfiguring network interfaces, if necessary.