Monitoring

The adapter card incorporates the ConnectX IC, which operates in the range of temperatures between 0C and 105C.

There are three thermal threshold definitions that impact the overall system operation state:

  • Warning – 105°C: On managed systems only: When the device crosses the 105°C threshold, a Warning Threshold message will be issued by the management SW, indicating to system administration that the card has crossed the Warning threshold. Note that this temperature threshold does not require nor lead to any action by hardware (such as adapter card shutdown).

  • Critical – 115°C: When the device crosses this temperature, the firmware will automatically shut down the device.

  • Emergency – 130°C: If the firmware fails to shut down the device upon crossing the Critical threshold, the device will auto-shutdown upon crossing the Emergency (130°C) threshold.

The card's thermal sensors can be read through the system’s SMBus. The user can read these thermal sensors and adapt the system airflow in accordance with the readouts and the needs of the above-mentioned IC thermal requirements.

A heatsink is attached to the ConnectX-5 IC in order to dissipate the heat from the ConnectX- 5 IC. It is attached either by using four spring-loaded push pins that insert into four mounting holes or by screws.

ConnectX-5 IC has a thermal shutdown safety mechanism that automatically shuts down the ConnectX-5 card in cases of a high-temperature event, improper thermal coupling, or heatsink removal.

© Copyright 2023, NVIDIA. Last updated on Jan 14, 2024.