Supported Traps and Events
Device events are listed as VDM or CDM in the Source column of the Events table in the UFM Web UI. For information about defining event policy, see Events Policy.
Alarm ID | Alarm Name | To Log | Alarm | Default Severity | Default Threshold | Default TTL | Related Object | Category | Description/Message |
116 | Port Xmit Discards | 1 | 1 | Minor | 200 | 300 | Port | Communication Error | Total number of outbound packets discarded by the port when the port is down or congested. Reasons include:
|
117 | Port Xmit Constraint Errors | 1 | 1 | Minor | 200 | 300 | Port | Communication Error | Total number of packets not transmitted from the switch physical port for the following reasons:
|
120 | Excessive Buffer Overrun Errors | 1 | 1 | Minor | 100 | 300 | Port | Communication Error | The number of times that OverrunErrors consecutive flow control update periods occurred, each having at least one overrun error. Message: ExcessiveBufferOverrunErrors counter threshold exceeded. Threshold is %d, received value is %d. |
121 | VL15 Dropped | 1 | 1 | Minor | 50 | 300 | Port | Communication Error | Number of incoming VL15 packets dropped due to resource limitations (e.g., lack of buffers) in the port. Message: VL15Dropped counter threshold exceeded. Threshold is %d, received value is %d. |
118 | Port Receive Constraint Errors | 1 | 1 | Minor | 200 | 300 | Port | Communication Error | Total number of packets received on the switch physical port that are discarded for the following reasons:
|
145 | System Image GUID changed | 1 | 0 | Info | 1 | 300 | Port | Communication Error | System GUID is changed for the specific LID |
115 | Port Receive Switch Relay Errors | 1 | 1 | Minor | 9999 | 300 | Port | Fabric Configuration | Total number of packets received on the port that were discarded because they could not be forwarded by the switch relay. Reasons for this include:
|
256 | Bad M_Key | 1 | 0 | Minor | 1 | 300 | Port | Fabric Configuration | Found bad M_Key. Check your HCA driver or partition settings. SM Trap. Management Key (M_Key): Enforces the control of a master subnet manager. Administered by the subnet manager and used in certain subnet management packets. Message: Bad M_Key: port1(lid %(lid)d, #%(portn)d) %(pkey)08x, port2(lid%(lid2)d #%(portn2)d) |
257 | Bad P_Key | 1 | 0 | Minor | 1 | 300 | Port | Fabric Configuration | Found a bad P_Key. Check your partitioning settings. SM Trap. Partition Key (P_Key): Enforces membership. Administered through the subnet manager by the partition manager (PM). Message: Bad P_Key: port1(lid %(lid)d, #%(portn)d) %(pkey)08x, port2(lid%(lid2)d #%(portn2)d) |
258 | Bad Q_Key | 1 | 0 | Minor | 1 | 300 | Port | Fabric Configuration | Found bad Q_Key. Security error. SM Trap. Queue Key (Q_Key): Enforces access rights for reliable and unreliable datagram service (RAW datagram service type not included). Message: Bad Q_Key: port1(lid %(lid)d, #%(portn)d) %(pkey)08x, port2(lid%(lid2)d #%(portn2)d) |
259 | Bad P_Key Switch External Port | 1 | 0 | Critical | 1 | 300 | Port | Fabric Configuration | Found a bad P_Key. Check your partitioning settings. SM Trap. Partition Key (P_Key): Enforces membership. Administered through the subnet manager by the partition manager (PM). Message: Bad P_Key switch external port: port1(lid %(lid)d, #%(portn)d) %(pkey)08x, port2(lid%(lid2)d #%(portn2)d) |
64 | GID Address In Service | 1 | 0 | Info | 1 | 300 | Port | Fabric Notification | New GID is connected to the Fabric |
65 | GID Address Out of Service | 1 | 0 | Warning | 1 | 300 | Port | Fabric Notification | Existing GID is disconnected from the Fabric |
66 | New MCast Group Created | 1 | 0 | Info | 1 | 300 | Port | Fabric Notification | New Multicast Group is created in SM |
67 | MCast Group Deleted | 1 | 0 | Info | 1 | 300 | Port | Fabric Notification | Multicast Group is removed from SM. |
328 | Link is Up | 1 | 0 | Info | 1 | 0 | Link | Fabric Topology | Event is sent upon discovery of a new link |
328 | Link is Down | 1 | 0 | Warning | 1 | 0 | Link | Fabric Topology | Event is sent when exiting link is removed |
144 | Capability Mask Modified | 0 | 0 | Info | 1 | 300 | Port | Fabric Notification | Capability Mask of the specific LID is modified |
602 | UFM Server Failover | 1 | 1 | Critical | 1 | 0 | Site | Fabric Notification | Failover in UFM Server (in HA mode) |
391 | Switch Module Removed | 1 | 0 | Info | 1 | 0 | Switch | Fabric Notification | Module (line card, FAN or PS) is removed from the switch |
331 | Node is Down | 1 | 0 | Warning | 1 | 0 | Site | Fabric Topology | Node is disconnected or down |
332 | Node is Up | 1 | 0 | Info | 1 | 300 | Site | Fabric Topology | Node is connected or up |
907 | Switch is Down | 1 | 1 | Critical | 1 | 0 | Site | Fabric Topology | Switch is disconnected or down |
908 | Switch is Up | 1 | 1 | Info | 1 | 300 | Site | Fabric Topology | Switch is connected or up |
370 | Gateway Ethernet Link State Changed | 1 | 0 | Warning | 1 | 0 | Gateway | Gateway | Gateway Ethernet Physical link has changed state |
371 | Gateway Re-register Event Received | 1 | 0 | Warning | 1 | 0 | Gateway | Gateway | 10GbE Gateway received a re-register event from the SM. |
372 | Number of Gateways is Changed | 1 | 0 | Warning | 1 | 0 | Gateway | Gateway | Change in the number of 10GbE Gateways has been detected |
373 | Gateway will be Rebooted | 1 | 0 | Warning | 1 | 0 | Gateway | Gateway | 10GbE Gateway is about to reboot |
374 | Gateway Reloading Finished | 1 | 0 | Info | 1 | 0 | Gateway | Gateway | 10GbE Gateway has finished reloading. |
110 | Symbol Error | 1 | 1 | Warning | 200 | 300 | Port | Hardware | Total number of minor link errors detected on one or more physical lanes |
111 | Link Error Recovery | 1 | 1 | Minor | 1 | 300 | Port | Hardware | Total number of times the Port Training state machine has successfully completed the link error recovery process |
112 | Link Downed | 1 | 1 | Critical | 1 | 300 | Port | Hardware | Total number of times the Port Training state machine has failed the link error recovery process and downed the link. |
113 | Port Receive Errors | 1 | 1 | Minor | 5 | 300 | Port | Hardware | Total number of packets containing an error that were received on a port. These errors include:
|
114 | Port Receive Remote Physical Errors | 1 | 0 | Minor | 5 | 300 | Port | Hardware | Total number of packets marked with the EBP delimiter received on the port |
119 | Local Link Integrity Errors | 1 | 1 | Minor | 5 | 300 | Port | Hardware | The number of times that the frequency of packets containing local physical errors has exceeded LocalPhyErrors. Message: LocalLinkIntegrityErrors counter threshold exceeded. Threshold is %d, received value is %d |
122 | Congested Bandwidth (%) Threshold Reached | 1 | 1 | Minor | 10 | 300 | Port | Hardware | Percent of Congested Bandwidth has exceeded defined threshold. Note: a different threshold can be set specifically for Tier 4 ports. |
131 | Non-optimal link width (1X instead of 4X) | 1 | 1 | Minor | 1 | 0 | Port | Hardware | 4X link operates as 1X link |
132 | Non-optimal link width (1X or 4X instead of 12X) | 1 | 1 | Minor | 1 | 0 | Port | Hardware | 12X links operates as 4X or 1X link |
701 | Non-optimal Link Speed | 1 | 1 | Minor | 1 | 0 | Port | Hardware | DDR link operates as SDR or QRD link operates as DDR or QDR or EDR link operates as FDR,QDR,DDR or SDR or FDR link operates as QDR,DDR or SDR |
140 | Excessive Buffer Overrun Threshold Reached | 1 | 0 | Minor | 1 | 300 | Port | Hardware | SM Trap. This error is detected when the number of consecutive flow control update periods with at least one overrun error in each period exceeds the OverrunErrors threshold given in the PortInfo attribute. Message: Excessive Buffer Overrun Threshold is reached: lid %(lid)d, port #%(portn)d |
141 | Flow Control Update Watchdog Timer Expired | 1 | 0 | Warning | 1 | 300 | Port | Hardware | SM Trap. The error indicates a failure of the flow control machine at the other end of the link. If the timer expires without receiving an update, a flow control update error has occurred. Message: Flow Control Update watchdog timer has expired: lid %(lid)d, port #%(portn)d |
392 | Module Temperature Threshold Reached | 1 | 0 | Info | 40 | 0 | Module | Hardware | Temperature detected by module sensor is too high, has exceeded the defined threshold. |
350 | Environment Added | 1 | 0 | Info | 1 | 0 | Env | Logical Model | New Logical Environment is created |
351 | Environment Removed | 1 | 0 | Info | 1 | 0 | Env | Logical Model | Logical Environment is deleted |
306 | Logical Server Added | 1 | 0 | Info | 1 | 0 | Logical Server | Logical Model | New Logical Server or Logical Servers Group is created |
307 | Logical Server Removed | 1 | 0 | Info | 1 | 0 | Logical Server | Logical Model | Logical Server or Logical Servers Group is deleted |
352 | Network Added | 1 | 0 | Info | 1 | 0 | Network | Logical Model | New Network is created |
353 | Network Removed | 1 | 0 | Info | 1 | 0 | Network | Logical Model | Network is deleted |
340 | Network Interface Added | 1 | 0 | Info | 1 | 0 | Logical Server | Logical Model | New Network Interface is created |
341 | Network Interface Removed | 1 | 0 | Info | 1 | 0 | Logical Server | Logical Model | Network Interface is deleted |
313 | Compute Resource Allocated | 1 | 0 | Info | 1 | 0 | Logical Server | Logical Model | A resource is allocated to the Logical Server |
312 | Compute Resource Released | 1 | 0 | Info | 1 | 0 | Logical Server | Logical Model | A resource is released from the Logical Server |
317 | Logical Server Compute Resource is Up | 1 | 1 | Warning | 1 | 0 | Logical Server | Logical Model | An allocated resource is Down or Disconnected |
316 | Logical Server Compute Resource is Down | 1 | 1 | Critical | 1 | 0 | Logical Server | Logical Model | An allocated resources is Up or Connected back |
301 | Logical Server State Changed | 1 | 0 | Info | 1 | 0 | Logical Server | Logical Model | Logical Server state is changed |
302 | Logical Server State Change Failed | 1 | 0 | Minor | 1 | 0 | Logical Server | Logical Model | Logical Server has failed to change the state. RM (Resource Manager) Event. Indicates error in Logical Server state change. This error might be caused by any error condition related to the Logical Server resources allocation. Message: Logical Server changed state from %s to %s |
308 | Logical Server Resources Allocated | 1 | 0 | Info | 1 | 0 | Logical Server | Logical Model | New resources are allocated to the Logical Server |
314 | Logical Server Additional Resources Allocated | 1 | 0 | Info | 1 | 0 | Logical Server | Logical Model | Additional resources are allocated to the Logical Server |
315 | Logical Server Resources Released | 1 | 0 | Info | 1 | 0 | Logical Server | Logical Model | Resources were released from the Logical Server |
336 | Port Action Succeeded | 1 | 0 | Info | 1 | 0 | Port | Maintenance | Port Management Action (reset, disable) succeeded |
337 | Port Action Failed | 1 | 0 | Minor | 1 | 0 | Port | Maintenance | Port Management Action (reset, disable) failed |
338 | Device Action Succeeded | 1 | 0 | Info | 1 | 0 | Port | Maintenance | Device Management Action succeeded |
339 | Device Action Failed | 1 | 0 | Minor | 1 | 0 | Port | Maintenance | Device Management Action failed |
385 | Switch FW Upgrade Started | 1 | 0 | Info | 1 | 0 | Switch | Maintenance | Switch FW Upgrade process has started |
386 | Switch SW Upgrade Started | 1 | 0 | Info | 1 | 0 | Switch | Maintenance | Switch SW Upgrade process has started |
381 | Switch Upgrade Failed | 1 | 0 | Info | 1 | 0 | Switch | Maintenance | Switch SW or FW Upgrade process failed |
388 | Host FW Upgrade Started | 1 | 0 | Info | 1 | 0 | Computer | Maintenance | Host FW Upgrade process has started |
389 | Host SW Upgrade Started | 1 | 0 | Info | 1 | 0 | Computer | Maintenance | Host SW Upgrade process has started |
383 | Host Upgrade Failed | 1 | 0 | Info | 1 | 0 | Computer | Maintenance | Host SW or FW Upgrade process failed |
502 | Device Upgrade Finished | 1 | 0 | Info | 1 | 300 | Device | Maintenance | Device SW or FW Upgrade has finished |
909 | Director Switch is Down | 1 | 1 | Critical | 1 | 300 | Site | Fabric Topology | Director Switch is disconnected or down |
910 | Director Switch is Up | 1 | 1 | Info | 1 | 0 | Site | Fabric Topology | Director Switch is connected or up |
911 | Module Temperature Low Threshold Reached | 1 | 1 | Warning | 60 | 300 | Module | Hardware | Temperature detected by module sensor is too high, has exceeded the low threshold |
912 | Module Temperature High Threshold Reached | 1 | 1 | Critical | 60 | 300 | Module | Hardware | Temperature detected by module sensor is too high, has exceeded the high threshold |
913 | Module High Voltage | 1 | 1 | Warning | 10 | 420 | Switch | Module Status | Sensor Voltage Threshold Exceeded |
914 | Module High Current | 1 | 1 | Warning | 10 | 420 | Switch | Module Status | Sensor Current Threshold Exceeded |
394 | Module Status FAULT | 1 | 1 | Critical | 1 | 420 | Switch | Module Status | Module Status FAULT |
545 | SM is not responding | 1 | 1 | Critical | 1 | 300 | Grid | Maintenance | SM is not responding |
915 | BER_ERROR | 1 | 1 | Critical | 1e-8 | 420 | Port | Hardware | Effective BER Error on port exceeded the threshold |
916 | BER_WARNING | 1 | 1 | Warning | 1e-13 | 420 | Port | Hardware | Effective BER Warning on port exceeded the threshold |
1300 | SM_SAKEY_VIOLATION | 1 | 1 | Warning | 5300 | Port | Fabric Notification | "SA Key Violation Committed" | |
1301 | SM_SGID_SPOOFED | 1 | 1 | Warning | 5300 | Port | Fabric Notification | "SGID spoofed by VPort/port" | |
1302 | SM_RATE_LIMIT_EXCEEDED | 1 | 1 | Warning | 5300 | Port | Fabric Notification | "Rate Limit Exceeded" | |
1303 | SM_MULTICAST_GROUPS_LIMIT_EXCEEDED | 1 | 1 | Warning | 5300 | Port | Fabric Notification | "Multicast Groups Limit Exceeded" | |
1304 | SM_SERVICES_LIMIT_EXCEEDED | 1 | 1 | Warning | 5300 | Port | Fabric Notification | "Services, Limit Exceeded" | |
1305 | SM_EVENT_SUBSCRIPTION_LIMIT_EXCEEDED | 1 | 1 | Warning | 5300 | Port | Fabric Notification | "Event Subscription Limit Exceeded" |