core.inference.quantization.mxfp8_tensor#

Module Contents#

Classes#

MXFP8Tensor

MXFP8 tensor wrapper class.

API#

class core.inference.quantization.mxfp8_tensor.MXFP8Tensor#

MXFP8 tensor wrapper class.

data: torch.Tensor#

None

scale: torch.Tensor#

None

size(idx: Optional[int] = None)#

Wrapper for calling self.data.size()

classmethod from_bf16(x: torch.Tensor, group_size: int = 32)#

Quantize BF16 tensor to MXFP8 format using FlashInfer.