NVIDIA SecureAI Attestation Advisory: HBM3 Resiliency Impact on Driver Versions r550.0-r550.90.12
Background
NVIDIA has implemented an HBM3 memory channel repair mechanism on certain Hopper GPUs to improve reliability and reduce unnecessary RMAs (Return Merchandise Authorizations). This mechanism, which permanently deactivates faulty memory channels through fuse programming during pre-RMA diagnostics, allows affected GPUs to continue operating normally.
Any attestation failures that result from this change are not a security vulnerability. They are a result of hardware configuration changes that were intentionally implemented by NVIDIA to improve device reliability.
Technical Impact
HBM channel repair alters the device's physical measurements that are verified during attestation. Drivers prior to r550.127.05 do not recognize these legitimate hardware changes as valid, resulting in attestation failures with error messages indicating measurement mismatches.
An attestation failure on a Hopper GPU might be caused by this memory resiliency feature rather than a security compromise.
Affected Drivers
The following driver versions might experience attestation failures when used with Hopper GPUs that have undergone HBM channel repair:
- NV_GPU_DRIVER_GH100_550.54.14
- NV_GPU_DRIVER_GH100_550.54.15
- NV_GPU_DRIVER_GH100_550.90.07
- NV_GPU_DRIVER_GH100_550.90.12
- NV_GPU_DRIVER_GH100_550.113
Recommendation
Upgrade to r550.127.05 or later, which correctly handles GPUs with repaired HBM channels.
If you experience attestation failures before upgrading, refer to NVIDIA SecureAI Attestation Advisory: HBM3 Resiliency Impact on Driver Versions r550.0-r550.90.12 when communicating with your security teams to confirm that the attestation failures are caused by a known hardware configuration change and not a security breach.