swizzle.h

Functions

void nvte_swizzle_scaling_factors(const NVTETensor input, NVTETensor output, cudaStream_t stream)

Swizzling scaling factors into the required interleaved layout for GEMM.

Requirements:

  • scale_inv is stored in row-major.

  • scale_inv size is padded to 128x4 for row-scale and 4x128 for col-scale.

  • data is quantitized along K-dimension, i.e. 1D-scaling block lies along the K-dimension.

Parameters:
  • input[in] Input tensor with non-swizzled scale_inv.

  • output[inout] Output tensor which hosts swizzled scale_inv.

  • stream[in] CUDA stream used for the operation.