Appendix – AHX Monitoring

NVIDIA UFM-SDN Appliance User Manual v4.9.0

AHX monitoring enables monitoring HDR director switch cooling devices (i.e. AHX) and sending events to the UFM. The events are triggered on the switch associated with the cooling device if the monitoring utility encounters an issue.

The monitoring utility runs periodically and communicates with the AHX devices over the Modbus protocol (TCP port 502).

  1. Enable AHX monitoring. Run:

    Copy
    Copied!
                

    ufmapl [ mgmt-ha-active ] (config) # ib managed-switch ahx-monitor enable

  2. Set the AHX monitoring interval. Run:

    Copy
    Copied!
                

    ufmapl [ mgmt-ha-active ] (config) # ib managed-switch ahx-monitor interval

  3. Add AHX devices for monitoring. Run:

    Copy
    Copied!
                

    ufmapl [ mgmt-ha-active ] (config) # ib managed-switch ahx-monitor device

  4. [Optional] Review the settings. Run:

    Copy
    Copied!
                

    ufmapl [ mgmt-ha-active ] (config) # show ib managed-switch ahx-monitor AHX Monitoring: Enabled : Yes Interval: 1m     AHX Devices: Switch name : switch-01 Primary IP address : 10.10.1.11 Secondary IP address: 10.11.1.11

Alarm ID

Alarm Name

To Log

Alarm

Severity

Threshold

TTL

Related Object

Category

Message

1400

COOLING_DEV_HIGH_AMBIENT_TEMP

1

1

Warning

86400

Switch

Hardware

High Ambient Temperature

1401

COOLING_DEV_HIGH_FLUID_TEMP

1

1

Warning

86400

Switch

Hardware

High Fluid Temperature

1402

COOLING_DEV_LOW_FLUID_LEVEL

1

1

Warning

86400

Switch

Hardware

Low Fluid Level

1403

COOLING_DEV_LOW_SUPPLY_PRESS

1

1

Warning

86400

Switch

Hardware

Low Supply Pressure

1404

COOLING_DEV_HIGH_SUPPLY_PRESS

1

1

Warning

86400

Switch

Hardware

High Supply Pressure

1405

COOLING_DEV_LOW_RETURN_PRESS

1

1

Warning

86400

Switch

Hardware

Low Return Pressure

1406

COOLING_DEV_HIGH_RETURN_PRESS

1

1

Warning

86400

Switch

Hardware

High Return Pressure

1407

COOLING_DEV_HIGH_DIFF_PRESS

1

1

Warning

86400

Switch

Hardware

High Differential Pressure

1408

COOLING_DEV_LOW_DIFF_PRESS

1

1

Warning

86400

Switch

Hardware

Low Differential Pressure

1409

COOLING_DEV_SYSTEM_FAIL_SAFE

1

1

Warning

86400

Switch

Hardware

System Fail Safe

1410

COOLING_DEV_FAULT_CRITICAL

1

1

Critical

86400

Switch

Hardware

Fault Critical

1411

COOLING_DEV_FAULT_PUMP1

1

1

Critical

86400

Switch

Hardware

Fault Pump1

1412

COOLING_DEV_FAULT_PUMP2

1

1

Critical

86400

Switch

Hardware

Fault Pump2

1413

COOLING_DEV_FLUID_LEVEL_CRIT

1

1

Critical

86400

Switch

Hardware

Fault Fluid Level Critical

1414

COOLING_DEV_FLUID_OVERTEMP

1

1

Critical

86400

Switch

Hardware

Fault Fluid Over Temperature

1415

COOLING_DEV_FAULT_PRIMARY_DC

1

1

Critical

86400

Switch

Hardware

Fault Primary DC

1416

COOLING_DEV_FAULT_REDUND_DC

1

1

Critical

86400

Switch

Hardware

Fault Redundant DC

1417

COOLING_DEV_FAULT_FLUID_LEAK

1

1

Critical

86400

Switch

Hardware

Fault Fluid Leak

1418

COOLING_DEV_SENSOR_FAILURE

1

1

Critical

86400

Switch

Hardware

Fault Sensor Failure

1419

COOLING_DEV_MONITOR_ERROR

1

0

Critical

1

Grid

Hardware

Cooling Device Monitoring Error

1420

COOLING_DEV_COMM_ERROR

1

1

Critical

86400

Switch

Hardware

Cooling Device Communication Error

© Copyright 2023, NVIDIA. Last updated on Sep 5, 2023.