swizzle.h
Functions
-
void nvte_swizzle_scaling_factors(const NVTETensor input, NVTETensor output, cudaStream_t stream)
Swizzling scaling factors into the required interleaved layout for GEMM.
Requirements:
scale_inv is stored in row-major.
scale_inv size is padded to 128x4 for row-scale and 4x128 for col-scale.
data is quantitized along K-dimension, i.e. 1D-scaling block lies along the K-dimension.
- Parameters:
input – [in] Input tensor with non-swizzled scale_inv.
output – [inout] Output tensor which hosts swizzled scale_inv.
stream – [in] CUDA stream used for the operation.