Step-3.5-Flash#
Step-3.5-Flash is a MoE language model from StepFun. Megatron Bridge supports checkpoint conversion, inference, and continued pretraining through a dedicated Step-3.5 bridge, provider, and recipe.
Supported Variants#
Step-3.5-Flash: https://huggingface.co/stepfun-ai/Step-3.5-Flash
Architecture Notes#
Hybrid attention pattern with full attention interleaved with sliding attention.
Fused per-head attention gate (
g_proj) is merged into Megatronlinear_qkvweights.MoE decoder layers are surrounded by dense layers at the beginning/end and MTP layers.
The provider carries
layer_types, full-attention settings, and sliding-attention settings from the Hugging Face config.Current examples require Megatron-LM
devbranch changes until the required MCore support reaches the pinned submodule.
Examples#
For conversion, inference, pretraining, Slurm launch scripts, and MCore branch notes, see the Step-3.5-Flash examples README.