core.inference.moe#
Submodules#
Package Contents#
Classes#
Backend for grouped GEMM operations during inference. |
API#
- class core.inference.moe.InferenceGroupedGemmBackend(*args, **kwds)#
Bases:
enum.EnumBackend for grouped GEMM operations during inference.
The string value matches the inference_grouped_gemm_backend config field so TransformerConfig.post_init can convert via InferenceGroupedGemmBackend(str).
Initialization
- FLASHINFER#
‘flashinfer’
- TORCH#
‘torch’
- VLLM#
‘vllm’