NVIDIA SHARP Traps
Sharp traps are notifications sent by the switches to sharp_am. The traps alert sharp_am about various sharp-related events. Some traps can indicate a possible error in the client application logic, some traps can indicate a potential error in the Switch and require investigation by Nvidia experts.
All the traps received by sharp_am are displayed in sharp_am log file and in UFM events.
All traps are reported with the relevant switch LID. Some traps provide additional information, such as the relevant job, relevant QPs and a syndrome that gives a more precise reason for the trap.
List of traps
| Trap Name | Trap Number | Description | Possible root cause | 
| AMKeyViolation | 257 | Tells that someone tried to send a MAD with wrong AM-KEY. This is a security warning, libsharp is using the right AM-KEY and Job-Key the sender of the MADs should be checked. | |
| QPError | 132 | ||
| QPAllocation- Timeout | 133 | This trap tells that a QP Allocation request was received by the switch, but a QP Allocation Confirmation was not received by the switch afterward, and the timeout expired for waiting for the confirmation. | In case the client application terminated abnormally during the initialization phase, there is a slight probability that the application managed to send the QP Allocation MAD and terminated before sending the Confirmation MAD. | 
| SharpInvalidRequest | 134 | This trap tells that there is an error while trying to aggregate the received data. | Depending on the syndrome, this error can be a result of a wrong logic of the client application. Syndrome values that can hint to a problem in the client app: 2 - Invalid Opcode: The clients of the application are not using the same aggregation logic. 3 - Invalid Vector Size / Invalid Payload Size: The clients of the application are not using the same buffer size. 8 - Child not in group: A request arrived from a none member in the group. 9 - Bad Target HDR: TBD? 14 - Sharp Payload not Aligned: TBD? 15 - ANDR request on non-SAT: A request was made to perform ReduceScatter, but the sharp job was asked without SAT support. 16 - Group context doesn't exist: TBD? | 
| SharpError | 135 | ||
| FlushComplete | 136 |