core.extensions.transformer_engine_spec_provider#

Module Contents#

Classes#

TESpecProvider

A protocol for providing the submodules used in Spec building.

API#

class core.extensions.transformer_engine_spec_provider.TESpecProvider#

Bases: megatron.core.models.backends.BackendSpecProvider

A protocol for providing the submodules used in Spec building.

linear() type#

Which linear module TE backend uses

column_parallel_linear() type#

Which column parallel linear module TE backend uses

row_parallel_linear() type#

Which row parallel linear module TE backend uses

fuse_layernorm_and_linear() bool#

TE backend chooses a single module for layernorm and linear

column_parallel_layer_norm_linear() Optional[type]#

Which module for sequential layernorm and linear

layer_norm(
rms_norm: bool = False,
for_qk: bool = False,
) megatron.core.transformer.torch_norm.LayerNormBuilder#

Which module to use for layer norm

core_attention() type#

Which module to use for attention

grouped_mlp_modules(
moe_use_grouped_gemm: bool,
moe_use_legacy_grouped_gemm: bool,
) tuple[type[megatron.core.transformer.moe.experts.TEGroupedMLP], megatron.core.transformer.moe.experts.TEGroupedMLPSubmodules] | tuple[type[megatron.core.transformer.moe.experts.SequentialMLP], megatron.core.transformer.mlp.MLPSubmodules] | tuple[type[megatron.core.transformer.moe.experts.GroupedMLP], None]#

Which module and submodules to use for grouped mlp

activation_func() megatron.core.transformer.mlp.TEActivationFunctionBuilder | None#

Which module to use for activation function