Step-3.5-Flash#

Step-3.5-Flash is a MoE language model from StepFun. Megatron Bridge supports checkpoint conversion, inference, and continued pretraining through a dedicated Step-3.5 bridge, provider, and recipe.

Supported Variants#

Step-3.5-Flash: https://huggingface.co/stepfun-ai/Step-3.5-Flash

Architecture Notes#

Hybrid attention pattern with full attention interleaved with sliding attention.
Fused per-head attention gate (g_proj) is merged into Megatron linear_qkv weights.
MoE decoder layers are surrounded by dense layers at the beginning/end and MTP layers.
The provider carries layer_types, full-attention settings, and sliding-attention settings from the Hugging Face config.
Current examples require Megatron-LM dev branch changes until the required MCore support reaches the pinned submodule.

Examples#

For conversion, inference, pretraining, Slurm launch scripts, and MCore branch notes, see the Step-3.5-Flash examples README.

Step-3.5-Flash#

Supported Variants#

Architecture Notes#

Examples#

Related Implementation#