MatmulQuantizationScales#

class nvmath.linalg.advanced.MatmulQuantizationScales(
a: float | None = None,
b: float | None = None,
c: float | None = None,
d: float | None = None,
)[source]#

A data class for providing quantization_scales to Matmul constructor and the wrapper function matmul().

Scales can only be set for narrow-precision (FP8 and lower) matrices.

When MatmulOptions.block_scaling=False, each scale can either be a scalar (integer or float) or a single-element tensor of shape () or (1,).

When MatmulOptions.block_scaling=True, each scale should be a 1D uint8 tensor with layout matching the requirements of cuBLAS MXFP8 scaling tensor. Values in the tensor will be interpreted as UE8M0 values. This means that a value \(x\) in the scaling tensor will cause cuBLAS to multiply the respective block by \(2^{x-127}\).

a#

Scale for matrix A.

Type:

float or Tensor

b#

Scale for matrix B.

Type:

float or Tensor

c#

Scale for matrix C.

Type:

float or Tensor

d#

Scale for matrix D.

Type:

float or Tensor

See also

Matmul, matmul()