bridge.models.stepfun.step37_provider#
Step3.7 model provider.
Mirrors qwen_vl/qwen3_vl_provider.py: extends the text-decoder provider
with multimodal fields (vision config, image token id, projector knobs) and
returns a :class:Step37Model instance instead of a bare GPTModel.
Module Contents#
Classes#
Model provider for Step3.7. |
Data#
API#
- class bridge.models.stepfun.step37_provider.Step37ModelProvider#
Bases:
megatron.bridge.models.stepfun.step35_provider.Step35ModelProviderModel provider for Step3.7.
Inherits every Step-3.5 text-decoder field from
- Class:
Step35ModelProvider(per-layerlayer_types/rotary_percents/swiglu_limits,head_wise_attn_gate, MoE settings, MTP layers, sliding-attention overrides) and adds the multimodal fields needed to build :class:Step37Model.
- position_embedding_type: str#
‘rope’
- vision_config: Optional[Any]#
None
- image_token_id: int#
128001
- understand_projector_stride: int#
2
- projector_bias: bool#
False
- language_max_sequence_length: int#
262144
- freeze_language_model: bool#
False
- freeze_vision_model: bool#
False
- freeze_vision_projection: bool#
False
- add_encoder: bool#
True
- add_decoder: bool#
True
- provide(
- pre_process: Optional[bool] = None,
- post_process: Optional[bool] = None,
- vp_stage: Optional[int] = None,
Build a :class:
Step37Modelfor the current PP/VP stage.
- provide_language_model(
- pre_process: Optional[bool] = None,
- post_process: Optional[bool] = None,
- vp_stage: Optional[int] = None,
Provide just the text decoder (no vision tower).
- bridge.models.stepfun.step37_provider.__all__#
[‘Step37ModelProvider’]