MatmulQuantizationScales#
-
class nvmath.
linalg. advanced. MatmulQuantizationScales(
)[source]#
A data class for providing quantization_scales to
Matmul
constructor and the wrapper functionmatmul()
.Scales can only be set for narrow-precision (FP8 and lower) matrices.
When
MatmulOptions.block_scaling=False
, each scale can either be a scalar (integer or float) or a single-element tensor of shape()
or(1,)
.When
MatmulOptions.block_scaling=True
, each scale should be a 1Duint8
tensor with layout matching the requirements of cuBLAS MXFP8 scaling tensor. Values in the tensor will be interpreted as UE8M0 values. This means that a value \(x\) in the scaling tensor will cause cuBLAS to multiply the respective block by \(2^{x-127}\).