swizzle.h
Functions
-
void nvte_swizzle_scaling_factors(const NVTETensor input, NVTETensor output, cudaStream_t stream)
Swizzling scaling factors into the required interleaved layout for GEMM.
Requirements:
scale_inv is stored in row-major.
scale_inv size is padded to 128x4 for row-scale and 4x128 for col-scale.
data is quantitized along K-dimension, i.e. 1D-scaling block lies along the K-dimension.
- Parameters:
input – [in] Input tensor with non-swizzled scale_inv.
output – [inout] Output tensor which hosts swizzled scale_inv.
stream – [in] CUDA stream used for the operation.
-
void nvte_multi_tensor_swizzle_scaling_factors(const NVTETensor *inputs, NVTETensor *outputs, const size_t num_tensors, cudaStream_t stream)
Swizzling scaling factors into the required interleaved layout for GEMM.
Requirements:
scale_inv is stored in row-major.
scale_inv size is padded to 128x4 for row-scale and 4x128 for col-scale.
data is quantitized along K-dimension, i.e. 1D-scaling block lies along the K-dimension.
- Parameters:
inputs – [in] Input tensors with non-swizzled scale_inv.
outputs – [inout] Output tensors which hosts swizzled scale_inv.
stream – [in] CUDA stream used for the operation.