bridge.recipes.stepfun.step37#

Step3.7 (stepfun-ai/Step-3.7-Flash) recipe.

Only the Flickr8k SFT path is supported. Step37Model.forward takes list[ImageForInsert] directly, and the data path is the self-contained Step37Flickr8kSFTDataProvider (HF datasets / processor not involved).

hf_path defaults to stepfun-ai/Step-3.7-Flash. Override it with the hf_path recipe argument (e.g. run_recipe.py --hf_path /path/to/ckpt) to load from a local checkpoint instead.

Module Contents#

Functions#

step37_sft_flickr8k_config

Step3.7 SFT recipe — the only supported Step3.7 path.

step37_sft_flickr8k_smoke_config

Smoke variant of :func:step37_flickr8k_sft_config — the same packed sample on every DP rank, every step. Deterministic and tiny: it repeats pack[fixed_pack_idx] indefinitely so the loss curve visibly drops as the model overfits a single batch.

Data#

API#

bridge.recipes.stepfun.step37.step37_sft_flickr8k_config(
hf_path: str = 'stepfun-ai/Step-3.7-Flash',
*,
sample_count: Optional[int] = 8,
max_packing_seqlen: int = 2048,
seqlen_divisible_by: int = 64,
oversize_policy: str = 'drop',
dataset_sampling: str = 'random',
cache_dir: str = '.cache/step37_flickr8k',
prompt: str = 'Describe this image in one sentence.',
) megatron.bridge.training.config.ConfigContainer#

Step3.7 SFT recipe — the only supported Step3.7 path.

Uses the Flickr8k packed pipeline:

  • cfg.dataset is :class:Step37Flickr8kSFTDataProvider (sync packing, no async wrapper, no HFDatasetConversationProvider).

  • --step_func step37_flickr8k_step consumes the packed dict and passes list[ImageForInsert] straight to Step37Model.forward.

  • micro_batch_size is pinned at 1 — each pack already aggregates multiple sub-seqs via cu_seqlens.

  • Tokenizer loaded with trust_remote_code=False; no HF custom Python code runs in the data path.

Kwargs: hf_path: HF model id or local path to the Step3.7 checkpoint (default stepfun-ai/Step-3.7-Flash). sample_count: limit the train split to the first N samples. Default is 8 (smoke). Pass None to use the full Flickr8k train CSV (~6000 rows, ~1 GB jpgs, 10+ min cold download — use via CLI dataset.sample_count=null only when you intend a real run). max_packing_seqlen: max NTP-length tokens per pack (default 2048). seqlen_divisible_by: pad total NTP length up to this multiple (default 64). oversize_policy: “drop” or “extend” — what to do with a single sample whose NTP length already exceeds max_packing_seqlen. dataset_sampling: “sequential” or “random” — in-domain order. cache_dir: local cache for the Flickr8k download. prompt: user prompt prefixed to every assistant-caption pair.

bridge.recipes.stepfun.step37.step37_sft_flickr8k_smoke_config(
hf_path: str = 'stepfun-ai/Step-3.7-Flash',
*,
sample_count: int = 8,
max_packing_seqlen: int = 2048,
fixed_pack_idx: int = 0,
train_iters: int = 100,
max_lr: float = 0.005,
cache_dir: str = '.cache/step37_flickr8k_smoke',
) megatron.bridge.training.config.ConfigContainer#

Smoke variant of :func:step37_flickr8k_sft_config — the same packed sample on every DP rank, every step. Deterministic and tiny: it repeats pack[fixed_pack_idx] indefinitely so the loss curve visibly drops as the model overfits a single batch.

Differences vs. the regular SFT config:

  • dataset.fixed_pack_idx pins __getitem__ → identical input across every DP rank and every iteration.

  • dataset.dataset_sampling = "sequential" for reproducibility.

  • max_lr bumped 5e-6 → 5e-3 so the overfit happens within train_iters steps.

  • Language model unfrozen (the regular config freezes it); vision tower stays frozen (overfitting on the projector + LM is enough and avoids the PE-G/14 backward cost).

  • log_interval=1, eval disabled, no mid-run checkpoint save.

Kwargs: hf_path: HF model id or local path to the Step3.7 checkpoint (default stepfun-ai/Step-3.7-Flash). sample_count: tiny train slice (default 8); raise only if pack 0 is unrepresentative. max_packing_seqlen: max NTP-length tokens per pack. fixed_pack_idx: which pack to repeat (default 0). train_iters: number of smoke iterations (default 100). max_lr: peak LR for the cosine schedule (default 5e-3). cache_dir: separate cache so the smoke download doesn’t shadow the full-Flickr8k cache.

bridge.recipes.stepfun.step37.__all__#

[‘step37_sft_flickr8k_config’, ‘step37_sft_flickr8k_smoke_config’]