bridge.models.stepfun.step37_provider#

Step3.7 model provider.

Mirrors qwen_vl/qwen3_vl_provider.py: extends the text-decoder provider with multimodal fields (vision config, image token id, projector knobs) and returns a :class:Step37Model instance instead of a bare GPTModel.

Module Contents#

Classes#

Step37ModelProvider

Model provider for Step3.7.

Data#

API#

class bridge.models.stepfun.step37_provider.Step37ModelProvider#

Bases: megatron.bridge.models.stepfun.step35_provider.Step35ModelProvider

Model provider for Step3.7.

Inherits every Step-3.5 text-decoder field from

Class:

Step35ModelProvider (per-layer layer_types / rotary_percents / swiglu_limits, head_wise_attn_gate, MoE settings, MTP layers, sliding-attention overrides) and adds the multimodal fields needed to build :class:Step37Model.

position_embedding_type: str#

‘rope’

vision_config: Optional[Any]#

None

image_token_id: int#

128001

understand_projector_stride: int#

2

projector_bias: bool#

False

language_max_sequence_length: int#

262144

freeze_language_model: bool#

False

freeze_vision_model: bool#

False

freeze_vision_projection: bool#

False

add_encoder: bool#

True

add_decoder: bool#

True

provide(
pre_process: Optional[bool] = None,
post_process: Optional[bool] = None,
vp_stage: Optional[int] = None,
) megatron.bridge.models.stepfun.modelling_step37.model.Step37Model#

Build a :class:Step37Model for the current PP/VP stage.

provide_language_model(
pre_process: Optional[bool] = None,
post_process: Optional[bool] = None,
vp_stage: Optional[int] = None,
) megatron.core.models.gpt.gpt_model.GPTModel#

Provide just the text decoder (no vision tower).

bridge.models.stepfun.step37_provider.__all__#

[‘Step37ModelProvider’]