Introduction#

What is an Xid Message#

The Xid message is an error report from the NVIDIA driver that is printed to the operating system’s kernel log or event log. Xid messages indicate that a general GPU error occurred, most often due to the driver programming the GPU incorrectly or to corruption of the commands sent to the GPU. The messages can be indicative of a hardware problem, an NVIDIA software problem, or a user application problem.

These messages provide diagnostic information that can be used by both users and NVIDIA to aid in debugging reported problems.

The meaning of each message is consistent across driver versions.

What is an SXid Message#

NVIDIA drivers for NVSwitch report error conditions relating to NVSwitch hardware in kernel logs through a similar mechanism to Xids. These “Switch Xids,” or SXids and guidelines for their usage are documented separately in the Fabric Manager User Guide.

Note

SXid messages apply only to Hopper and earlier generation GPUs.

How to Use Xid Messages#

Xid messages are intended to be used as debugging guides. Because many problems can have multiple possible root causes, it’s not always feasible to understand each issue from the Xid value alone.

For example, an Xid error might indicate that a user program tried to access invalid memory. But in theory, memory corruption due to PCIE or frame buffer (“FB”) problems could corrupt any command and thus cause almost any error. Generally, the Xid classifications listed below should be used as a starting point for further investigation of each problem.

The GPU Debug Guidelines manual provides additional guidance for debugging GPU problems, including advice for interpreting Xids and provides guidance for next steps to handle common Xids.