MatmulQuantizationScales#

class nvmath.linalg.advanced.MatmulQuantizationScales( a: float | None = None, b: float | None = None, c: float | None = None, d: float | None = None, )[source]#

A data class for providing quantization_scales to Matmul constructor and the wrapper function matmul().

Scales can only be set for narrow-precision (FP8 and lower) matrices.

When MatmulOptions.block_scaling=False, each scale can either be a scalar (integer or float) or a single-element tensor of shape () or (1,).

When MatmulOptions.block_scaling=True, each scale should be a 1D uint8 tensor with layout matching the requirements of cuBLAS MXFP8 scaling tensor. Values in the tensor will be interpreted as UE8M0 values. This means that a value \(x\) in the scaling tensor will cause cuBLAS to multiply the respective block by \(2^{x-127}\).

a#

Scale for matrix A.