bridge.models.stepfun.configuration_step37#

Step3.7 HF PretrainedConfig surrogate.

Mirrors configuration_step35.py. The real Step37Config ships with the upstream checkpoint at stepfun-ai/step3p7_flash_bf16/configuration_step3p7.py and is loaded via trust_remote_code=True at inference time. This file exists so the Megatron-Bridge package can be self-describing — Step37Config / Step37TextConfig / Step37VisionConfig here surface the same fields the bridge reads in Step37Bridge.provider_bridge, without requiring the remote-code shim to be on sys.path.

When the upstream config ships on HF, Step37Bridge can be retargeted at the upstream class; until then the Auto* classes pick the right config via auto_map in the checkpoint’s config.json.

Module Contents#

Classes#

Step37VisionConfig

HF-style config for the PE-G/14 vision tower used by Step3.7.

Step37TextConfig

HF-style text-decoder config for Step3.7.

Step37Config

Top-level HF-style config for Step3.7 (the multimodal wrapper).

Data#

API#

class bridge.models.stepfun.configuration_step37.Step37VisionConfig(
width: int = 1536,
layers: int = 47,
heads: int = 16,
num_channels: int = 3,
image_size: int = 728,
mlp_ratio: float = 8960 / 1536,
patch_size: int = 14,
hidden_act: str = 'quick_gelu',
layer_norm_eps: float = 1e-05,
use_cls_token: bool = False,
use_ln_pre: bool = True,
use_ln_post: bool = False,
use_abs_posemb: bool = True,
use_rope2d: bool = True,
ls_init_value: float = 0.1,
**kwargs,
)#

Bases: transformers.configuration_utils.PretrainedConfig

HF-style config for the PE-G/14 vision tower used by Step3.7.

Initialization

model_type#

‘perception_encoder’

class bridge.models.stepfun.configuration_step37.Step37TextConfig#

Bases: megatron.bridge.models.stepfun.configuration_step35.Step35Config

HF-style text-decoder config for Step3.7.

Identical schema to :class:Step35Config — Step3.7’s text backbone is Step-3.5. Keeping a distinct subclass makes future divergence trivial.

model_type#

‘step3p5’

class bridge.models.stepfun.configuration_step37.Step37Config(
vision_config: Optional[Union[dict, bridge.models.stepfun.configuration_step37.Step37VisionConfig]] = None,
text_config: Optional[Union[dict, bridge.models.stepfun.configuration_step37.Step37TextConfig]] = None,
understand_projector_stride: int = 2,
projector_bias: bool = False,
image_token_id: int = 128001,
**kwargs: Any,
)#

Bases: transformers.configuration_utils.PretrainedConfig

Top-level HF-style config for Step3.7 (the multimodal wrapper).

Initialization

model_type#

‘step3p7’

architectures#

[‘Step3p7ForConditionalGeneration’]

bridge.models.stepfun.configuration_step37.__all__#

[‘Step37Config’, ‘Step37TextConfig’, ‘Step37VisionConfig’]