core.post_training.modelopt.mamba.model_specs#

Module Contents#

Functions#

get_mamba_stack_modelopt_spec

Mix the native spec with TENorm.

API#

core.post_training.modelopt.mamba.model_specs.get_mamba_stack_modelopt_spec(
local_core_attention: bool = False,
remap_te_layernorm: bool = False,
) megatron.core.transformer.spec_utils.ModuleSpec#

Mix the native spec with TENorm.

This is essentially the native local spec except for the layernorm implementation is using TENorm from Transformer-Engine.