MatmulQuantizationScales#
-
class nvmath.
linalg. advanced. MatmulQuantizationScales(
)[source]#
A data class for providing quantization_scales to
Matmulconstructor and the wrapper functionmatmul().Scales can only be set for narrow-precision (FP8 and lower) matrices.
When
MatmulOptions.block_scaling=False, each scale can either be a scalar (integer or float) or a single-element tensor of shape()or(1,).When
MatmulOptions.block_scaling=True, each scale should be a 1Duint8tensor with layout matching the requirements of cuBLAS MXFP8 scaling tensor. Values in the tensor will be interpreted as UE8M0 values. This means that a value \(x\) in the scaling tensor will cause cuBLAS to multiply the respective block by \(2^{x-127}\).