bridge.models.stepfun.step35_bridge#
Module Contents#
Classes#
Maps Megatron per-expert weight{i} ↔ HF stacked expert tensor[i]. |
|
GatedMLPMapping for per-expert Megatron weights backed by HF stacked tensors. |
|
List of per-decoder-layer specs that returns a dense spec on negative-index access. |
|
Megatron Bridge for Step3.5 Causal LM. |
Functions#
Per-layer spec for Step3.5: dense for layers 0-2 and 45-47, MoE for 3-44. |
Data#
API#
- bridge.models.stepfun.step35_bridge.logger#
‘getLogger(…)’
- class bridge.models.stepfun.step35_bridge.StackedExpertAutoMapping#
Bases:
megatron.bridge.models.conversion.param_mapping.AutoMappingMaps Megatron per-expert weight{i} ↔ HF stacked expert tensor[i].
Step3.5 HF stores all experts in a single stacked tensor, e.g.
model.layers.*.moe.down_proj.weightwith shape[num_experts, H, I]. Megatron creates individual per-expert tensors namedweight0,weight1, …The
megatron_paramuses a trailingweight*wildcard to match these names;hf_paramhas one fewer wildcard (no expert index in the path). During wildcard resolution_resolve_namesresetscapture_indexto 0 for the HF side, sohf_paramonly consumes the layer-index capture and the expert-index capture is available to slice the stacked tensor inhf_to_megatron.- is_grouped_export#
True
- _expert_idx() int#
- hf_to_megatron(
- hf_weights: torch.Tensor,
- megatron_module,
- class bridge.models.stepfun.step35_bridge.StackedExpertGatedMLPMapping#
Bases:
megatron.bridge.models.conversion.param_mapping.GatedMLPMappingGatedMLPMapping for per-expert Megatron weights backed by HF stacked tensors.
HF stores all experts’ gate/up projections as stacked tensors with shape [num_experts, I, H]. Megatron creates individual per-expert
linear_fc1.weight{i}tensors (shape [2*I, H], gate+up fused).megatron_paramuses a trailingweight*wildcard.gate/upeach have one fewer wildcard (no expert index in the HF path). During wildcard resolution_resolve_namesresetscapture_indexfor every dict key, so both gate/up only consume the layer-index capture.- is_grouped_export#
True
- _expert_idx() int#
- hf_to_megatron(
- hf_weights: Dict[str, torch.Tensor],
- megatron_module,
- class bridge.models.stepfun.step35_bridge._MTPDenseLayerSpecsList(data, dense_mtp_spec)#
Bases:
listList of per-decoder-layer specs that returns a dense spec on negative-index access.
get_gpt_mtp_block_spec_for_backendreadsspec.layer_specs[-1]to decide which layer type the MTP transformer sub-layers should use. For Step3.5 the last decoder layer (layer 44) is MoE, but MTP layers 45-47 are NOT inmoe_layers_enumand must be dense.Overriding
__getitem__for negative indices intercepts only that single look-up while leaving normal forward iteration (used byTransformerBlockto instantiate the 45 main decoder layers) completely unaffected — CPython’s list iterator operates on the internal C array directly, bypassing__getitem__.Initialization
Initialize self. See help(type(self)) for accurate signature.
- __getitem__(idx)#
- bridge.models.stepfun.step35_bridge._build_step35_layer_spec(cfg, **kw)#
Per-layer spec for Step3.5: dense for layers 0-2 and 45-47, MoE for 3-44.
Also rewrites every main-decoder layer’s ModuleSpec to use
Step35DecoderLayerinstead of the defaultTransformerLayer. The custom layer readscfg.layer_typesat init time to determine whether the layer is a sliding-attention layer.Returns a TransformerBlockSubmodules whose layer_specs list is wrapped in _MTPDenseLayerSpecsList so that get_gpt_mtp_block_spec_for_backend receives a dense ModuleSpec (via layer_specs[-1]) for the MTP transformer sub-layers.
- class bridge.models.stepfun.step35_bridge.Step35Bridge#
Bases:
megatron.bridge.models.conversion.model_bridge.MegatronModelBridgeMegatron Bridge for Step3.5 Causal LM.
This bridge handles the conversion between HuggingFace Step3p5ForCausalLM (the HF architecture name; preserved verbatim to match the upstream config.json) and Megatron-Core GPTModel formats. Step3.5 models use mixture of experts architecture with QK layernorm.
.. rubric:: Example
from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“stepfun-ai/Step-3.5-Flash”) provider = bridge.to_megatron_provider()
- CONFIG_MAPPING#
None
- provider_bridge(
- hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM,
Convert HuggingFace Step3.5 config to GPTModelProvider.
- mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#