core.inference.quantization.mxfp8_tensor#
Module Contents#
Classes#
MXFP8 tensor wrapper class. |
API#
- class core.inference.quantization.mxfp8_tensor.MXFP8Tensor#
MXFP8 tensor wrapper class.
- data: torch.Tensor#
None
- scale: torch.Tensor#
None
- size(idx: Optional[int] = None)#
Wrapper for calling self.data.size()
- classmethod from_bf16(x: torch.Tensor, group_size: int = 32)#
Quantize BF16 tensor to MXFP8 format using FlashInfer.