bridge.models.stepfun.modelling_step37.transformer_config#

Step3.7 transformer config and vision-config helper.

Mirrors qwen_vl/modelling_qwen3_vl/transformer_config.py: the text-side config is the standard Megatron TransformerConfig already used by Step-3.5, extended with vision-tower fields. The HF StepRoboticsVisionEncoderConfig is passed straight through to the Megatron vision module — no separate Megatron-side TransformerConfig is constructed for the vision tower, since the PE-G/14 trunk does not use any Megatron tensor-parallel primitives.

Module Contents#

Classes#

Step37TransformerConfig

Step3.7 transformer config.

Functions#

get_vision_model_config

Return the HF vision config unchanged.

API#

class bridge.models.stepfun.modelling_step37.transformer_config.Step37TransformerConfig#

Bases: megatron.core.transformer.transformer_config.TransformerConfig

Step3.7 transformer config.

Extends the Step-3.5 text-decoder TransformerConfig with the multimodal fields that Step37Model reads at construction time. All Step-3.5 per-layer fields (layer_types, rotary_percents, rotary_base_per_layer, swiglu_limits, swiglu_limits_shared, attention_other_setting, sliding_attention_setting, head_wise_attn_gate) are inherited from the Step-3.5 model provider — this class only adds the vision-side fields.

vision_config: Any#

None

image_token_id: int#

128001

understand_projector_stride: int#

2

projector_bias: bool#

False

language_max_sequence_length: int#

262144

bridge.models.stepfun.modelling_step37.transformer_config.get_vision_model_config(vision_cfg: Any) Any#

Return the HF vision config unchanged.

Step37VisionModel consumes the HF StepRoboticsVisionEncoderConfig directly (it never uses Megatron tensor-parallel primitives), so this function is just a structural mirror of qwen_vl/modelling_qwen3_vl/transformer_config.get_vision_model_config for parity with the Qwen3-VL package shape. It is intentionally a no-op.