Gradient spikes and the SOAP preconditioner#
A set of executed-notebook walkthroughs of how different optimizers respond to a sudden gradient spike, and how that exposed a TF32 precision bug in the SOAP/REKLS KL-Shampoo preconditioner. Read in order:
Optimizer update comparison โ per-step update magnitude and spike-recovery behavior of AdamW, LaProp, Muon, SOAP, and REKLS.
Preconditioner eigenbasis rotation โ how the SOAP/REKLS eigenbasis
Q_Lrotates around a spike, and how it depends onfp32_matmul_prec.TF32 eigenvalue precision loss โ a standalone demo of the underlying
diag(Qแต L Q)precision failure that drives the rotation.