core.inference.moe#

Submodules#

Package Contents#

Classes#

InferenceGroupedGemmBackend

Backend for grouped GEMM operations during inference.

API#

class core.inference.moe.InferenceGroupedGemmBackend(*args, **kwds)#

Bases: enum.Enum

Backend for grouped GEMM operations during inference.

The string value matches the inference_grouped_gemm_backend config field so TransformerConfig.post_init can convert via InferenceGroupedGemmBackend(str).

Initialization

FLASHINFER#

‘flashinfer’

TORCH#

‘torch’

VLLM#

‘vllm’