MatmulOptions#

class nvmath.distributed.linalg.advanced.MatmulOptions( inplace: bool = False, compute_type: int | None = None, scale_type: int | None = None, result_type: int | None = None, algo_type: int | None = None, result_amax: bool = False, block_scaling: bool = False, sm_count_communication: int | None = None, logger: Logger | None = None, blocking: Literal[True, 'auto'] = 'auto', )[source]#

A data class for providing options to the Matmul object and the wrapper function matmul().

inplace#

Whether the matrix multiplication is performed in-place (operand C is overwritten). The default is inplace=False.

Type:: bool

compute_type#

CUDA compute type. A suitable compute type will be selected if not specified.

Type:: nvmath.distributed.linalg.ComputeType

scale_type#

CUDA data type. A suitable data type consistent with the compute type will be selected if not specified.

Type:: nvmath.CudaDataType

result_type#

CUDA data type. A requested datatype of the result. If not specified, this type will be determined based on the input types. Non-default result types are only supported for narrow-precision (FP8 and lower) operations.

Type:: nvmath.CudaDataType

algo_type#

Hints the algorithm type to be used. If not supported, cuBLASMp will fallback to the default algorithm.

Type:: nvmath.distributed.linalg.advanced.MatmulAlgoType

result_amax#

If set, the absolute maximum (amax) of the result will be returned in the auxiliary output tensor. Only supported for narrow-precision (FP8 and lower) operations.

Type:: bool

block_scaling#

If set, block scaling (MXFP8) will be used instead of tensor-wide scaling for FP8 operations. If the result is a narrow-precision (FP8 and lower) data type, scales used for result quantization will be returned in the auxiliary output tensor as "d_out_scale" in UE8M0 format. For more information on UE8M0 format, see the documentation of MatmulQuantizationScales. This option is only supported for narrow-precision (FP8 and lower) operations.

Type:: bool

sm_count_communication#

The number of SMs to use for communication. This is only relevant for some algorithms (please consult cuBLASMp documentation).

Type:: int

logger#

Python Logger object. The root logger will be used if a logger object is not provided.

Type:: logging.Logger

blocking#

A flag specifying the behavior of the execution functions and methods, such as matmul() and Matmul.execute(). When blocking is True, the execution methods do not return until the operation is complete. When blocking is "auto", the methods return immediately when the inputs are on the GPU. The execution methods always block when the operands are on the CPU to ensure that the user doesn’t inadvertently use the result before it becomes available. The default is "auto".

Type:: Literal[True, ‘auto’]