bridge.models.stepfun.step37_bridge#

Step3.7 multimodal bridge.

Registers a :class:MegatronModelBridge for the upstream Step3p7ForConditionalGeneration HF architecture (model_type step37). The bridge:

  • Re-uses :class:Step35Bridge’s text-decoder logic for provider_bridge by delegating to a synthetic Step-3.5 HF wrapper (the Step3.7 HF config exposes its Step-3.5 text fields under hf_config.text_config).

  • Adds vision-tower configuration (vision_config, image_token_id, projector_bias, understand_projector_stride) on top of the resulting provider.

  • Defines an HF↔Megatron parameter mapping registry that prefixes every Step-3.5 text mapping with language_model. (since

    class:

    Step37Model wraps the text GPTModel) and adds direct vision_model.* AutoMappings for the PE-G/14 trunk + downsamplers, plus a top-level vit_large_projector.weight mapping.

Module Contents#

Classes#

Step37Bridge

Megatron Bridge for Step3.7 (Step-3.5 text + Perception-Encoder G/14 vision).

Functions#

_lm

Prefix a Step-3.5 megatron_param with the language_model. namespace.

Data#

API#

bridge.models.stepfun.step37_bridge.logger#

‘getLogger(…)’

bridge.models.stepfun.step37_bridge._LM_PREFIX#

‘language_model.’

bridge.models.stepfun.step37_bridge._lm(megatron_param: str) str#

Prefix a Step-3.5 megatron_param with the language_model. namespace.

class bridge.models.stepfun.step37_bridge.Step37Bridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for Step3.7 (Step-3.5 text + Perception-Encoder G/14 vision).

.. rubric:: Example

from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained( … “/path/to/step3p7_flash_bf16”, trust_remote_code=True … ) provider = bridge.to_megatron_provider()

CONFIG_MAPPING#

None

provider_bridge(
hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM,
) megatron.bridge.models.stepfun.step37_provider.Step37ModelProvider#

Convert a HuggingFace Step3.7 config into a :class:Step37ModelProvider.

Mirrors the qwen3-vl bridge pattern:

  • Pull the nested text_config directly out of the top-level Step3.7 Step37Config and run the framework helper self.hf_config_to_provider_kwargs(text_config) to populate the common architecture fields (num_layers / hidden_size / num_attention_heads / ffn_hidden_size / vocab_size / rotary_base / etc.) via :attr:CONFIG_MAPPING. That helper uses hasattr + getattr(..., None) internally, so fields that are absent on the Step3.7 text config (e.g. anything Step-3.5 carried at the top level of its config.json) are skipped cleanly.

  • Construct :class:Step37ModelProvider directly from the filtered kwargs (instead of delegating to :meth:Step35Bridge.provider_bridge via a wrapper — that path was fragile because Step35Bridge does a number of bare hf_config.X reads that crash on missing fields like zero_centered or use_qk_norm).

  • Apply Step-3.5 text-decoder overrides with explicit getattr(text_config, name, default) for every field that may or may not be present in the released Step3.7 text_config.

  • Finally attach Step3.7 vision / multimodal fields from the top-level hf_config.

mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#

Return the full text + vision parameter mapping registry.

Text mappings replicate :meth:Step35Bridge.mapping_registry with a language_model. prefix on every Megatron-side path (since

Class:

Step37Model wraps the Step-3.5 GPTModel under self.language_model). Vision mappings are direct AutoMappings — the Megatron module structure mirrors the HF safetensors layout.

bridge.models.stepfun.step37_bridge.__all__#

[‘Step37Bridge’]