ConnectX-7 I/O Replacement

ConnectX-7 I/O Card Replacement Overview

  1. Identify the failed card

  2. Get a replacement ConnectX-7 IO card from NVIDIA Enterprise Support

  3. Make sure the system is shut down

  4. If cables don’t reach, label all cables and unplug them from the motherboard tray

  5. Slide motherboard out until it locks in place

  6. Open rear compartment

  7. Pull out the card directly above the failed ConnectX-7 to make room for the procedure

  8. Pull out the ConnectX-7 IO card

  9. Remove the IPEX cables from the old card

  10. Install the IPEX cables to the new card

  11. Install the new ConnectX-7 IO card

  12. Install the card that goes over the ConnectX-7 card

  13. Close the rear motherboard compartment

  14. Slide the motherboard back into the system

  15. Plug in all cables using the labels as a reference

  16. Power on the system

  17. Update the firmware if necessary and test the ConnectX-7 IO card

  18. Ship back the failed unit to NVIDIA Enterprise Support using the packaging provided

Prepare the System for Replacement

  1. First, identify which IO card to replace. Use the nvsm command or network tools to determine which card failed. After you have this information, contact NVIDIA Enterprise Support to get a replacement.

  2. When the card arrives, power off the system.

  3. Based on the output from nvsm, identify which card needs to be replaced, the card in slot 1 or in slot 2

    _images/case-rear.png

Remove the I/O Card above the ConnectX Card to be replaced

  1. Pull out the motherboard tray and access the IO door. Refer to Motherboard Tray - Opening and Closing the IO door for information about accessing the IO door.

  2. Remove the I/O card that is above the ConnectX card. The card can be the M.2 boot drive assembly or a network interface card.

    • Refer to M.2 Boot Drive Assembly Replacement to remove the M.2 boot drive carrier.

      The images at the preceding link show how to remove the boot drive carrier on the right, above the ConnectX card in slot 2. If you need to replace the ConnectX card in slot 2, follow the instructions, but use the thumbscrew on the left side of the motherboard tray.

    • Refer to Network Interface Card Replacement to remove the Ethernet NIC.

Remove the ConnectX Card

  1. Pull the card out of the slot:

    _images/dgx-h100-cx7-remove-card.png
  2. Before you pull the card too far, remove the white and black IPEX cables from the card.

    The white cable connects on top of the card and the black cable connects on the bottom (heatsink) of the card:

    _images/dgx-h100-cx7-ipex.png
  3. Follow the instructions in the next steps to remove and insert the IPEX connectors.

Remove an IPEX Cable

Repeat this process for both white and black cables.

  1. Lift the locking door:

    _images/ipex-cable-2.png
  2. Push the cable away from the connector:

    _images/ipex-cable-3.png

Insert an IPEX Cable

  1. Align the IPEX cable to the connector:

    _images/ipex-cable-4.png
  2. Press the cable into the connector:

    _images/ipex-cable-5.png
  3. Confirm the cable is in the connector:

    _images/ipex-cable-6.png
  4. Close the latching mechanism:

    _images/ipex-cable-7.png
  5. Make sure the cable is locked to the connector on the board:

    _images/ipex-cable-8.png

Install ConnectX Card

  1. After you connect the IPEX cables, install the new card in the slot:

    _images/connectx-card-new.png
  2. Confirm the card is in place and that the cables are connected:

    _images/connectx-card-installed.png

Install the I/O Card above the ConnectX Card

  1. Reinstall the I/O card that is above the ConnectX card. Refer to one of the two following procedures:

  2. Close the motherboard tray IO door and insert the motherboard tray. Refer to Motherboard Tray - Opening and Closing the IO door for more information.

Power on the System and Confirm the Replacement

  1. Power on and boot the system.

  2. Update the firmware on the card. Refer to the NVIDIA ConnectX-7 User Guide.

  3. Use the nvsm command to confirm that the system working correctly:

    sudo nvsm show health
    
  4. Use the packaging from the new component to ship the failed one back to NVIDIA Enterprise Support.