emerging_optimizers.psgd#
- emerging_optimizers.psgd.procrustes_step.procrustes_step(Q, max_step_size=0.125, eps=1e-08, order=2)[source]#
One step of an online solver for the orthogonal Procrustes problem.
The orthogonal Procrustes problem is \(\min_U \| U Q - I \|_F\) s.t. \(U^H U = I\) by rotating Q as \(\exp(a R) Q\), where \(R = Q^H - Q\) is the generator and \(\|a R\| < 1\).
If using 2nd order expansion, max_step_size should be less than \(1/4\) as we only expand \(\exp(a R)\) to its 2nd order term. If using 3rd order expansion, max_step_size should be less than \(5/8\). This method is an expansion of a Lie algebra parametrized rotation that uses a simple approximate line search to find the optimal step size, from Xi-Lin Li.
- Parameters:
Q (Tensor) – Tensor of shape (n, n), general square matrix to orthogonalize.
max_step_size (float) – Maximum step size for the line search. Default is 1/8. (0.125)
eps (float) – Small number for numerical stability.
order (Literal[2, 3]) – Order of the Taylor expansion. Must be 2 or 3. Default is 2.
- Return type:
- emerging_optimizers.psgd.psgd_kron_contractions.apply_kronecker_factors(Q_list, X)[source]#
Apply all Kronecker factors once to tensor \(X\), each to its corresponding dimension.
This applies each \(Q\) factor once, for example in 2D case: \(Q_1 X Q_2^T\).
- emerging_optimizers.psgd.psgd_kron_contractions.apply_preconditioner(Q_list, X)[source]#
Apply the full PSGD preconditioner to X.
This is the full Kronecker product of PSGD’s kronecker factors Q^T Q, applied to X.
\(P X = (Q_1^T Q_1) X (Q_2^T Q_2)\)
This applies each factor followed by its transpose for the full preconditioner effect.
- emerging_optimizers.psgd.psgd_kron_contractions.partial_contraction(G1, G2, axis)[source]#
Compute the partial contraction of G1 and G2 along axis axis. This is the contraction of the two tensors, but with all axes except axis contracted.
- emerging_optimizers.psgd.psgd_utils.norm_lower_bound_skew(A, k=32, half_iters=2, eps=1e-08)[source]#
A cheap lower bound on the spectral norm (largest eigenvalue) of skew-symmetric matrix.
Note: For skew-symmetric matrices, all diagonal entries are zero and \(A^T = -A\). From Xi-Lin Li.
- Parameters:
- Returns:
A scalar Tensor giving a lower bound on \(\|A\|_2\).
- Return type:
- emerging_optimizers.psgd.psgd_utils.norm_lower_bound_spd(A, k=32, half_iters=2, eps=1e-08)[source]#
A cheap lower bound for the spectral norm of a symmetric positive definite matrix.
- Parameters:
- Returns:
A scalar giving a lower bound on \(\\|A\\|_2\).
- Return type:
- emerging_optimizers.psgd.psgd_utils.uniformize_q_in_place(Q_list)[source]#
Balance the dynamic ranges of kronecker factors in place to prevent numerical underflow or overflow.
Each tensor in Q_list is rescaled so that its maximum absolute entry becomes the geometric mean of all factors original maxima. This preserves the overall product of norms (and thus the scale of the Kronecker product) while avoiding numerical underflow or overflow when factors have widely differing magnitudes.
Given tensors \(Q_1, Q_2, \ldots, Q_n\):
Compute max-absolute norms: \(\|Q_i\|_\infty = \max(|Q_i|)\) for \(i = 1, \ldots, n\)
Compute geometric mean: \(g = \left(\prod_{i=1}^{n} \|Q_i\|_\infty \right)^{1/n}\)
Rescale each tensor: \(Q_i \leftarrow Q_i \cdot \frac{g}{\|Q_i\|_\infty}\)
This ensures \(\|Q_i\|_\infty = g\) for all \(i\), while preserving the norm of the Kronecker product \(Q_1 \otimes Q_2 \otimes \cdots \otimes Q_n\).