bridge.recipes.stepfun.step37#
Step3.7 (stepfun-ai/Step-3.7-Flash) recipe.
Only the Flickr8k SFT path is supported. Step37Model.forward takes
list[ImageForInsert] directly, and the data path is the
self-contained Step37Flickr8kSFTDataProvider (HF datasets / processor
not involved).
hf_path defaults to stepfun-ai/Step-3.7-Flash. Override it with the
hf_path recipe argument (e.g. run_recipe.py --hf_path /path/to/ckpt)
to load from a local checkpoint instead.
Module Contents#
Functions#
Step3.7 SFT recipe — the only supported Step3.7 path. |
|
Smoke variant of :func: |
Data#
API#
- bridge.recipes.stepfun.step37.step37_sft_flickr8k_config(
- hf_path: str = 'stepfun-ai/Step-3.7-Flash',
- *,
- sample_count: Optional[int] = 8,
- max_packing_seqlen: int = 2048,
- seqlen_divisible_by: int = 64,
- oversize_policy: str = 'drop',
- dataset_sampling: str = 'random',
- cache_dir: str = '.cache/step37_flickr8k',
- prompt: str = 'Describe this image in one sentence.',
Step3.7 SFT recipe — the only supported Step3.7 path.
Uses the Flickr8k packed pipeline:
cfg.datasetis :class:Step37Flickr8kSFTDataProvider(sync packing, no async wrapper, noHFDatasetConversationProvider).--step_func step37_flickr8k_stepconsumes the packed dict and passeslist[ImageForInsert]straight toStep37Model.forward.micro_batch_sizeis pinned at1— each pack already aggregates multiple sub-seqs viacu_seqlens.Tokenizer loaded with
trust_remote_code=False; no HF custom Python code runs in the data path.
Kwargs: hf_path: HF model id or local path to the Step3.7 checkpoint (default
stepfun-ai/Step-3.7-Flash). sample_count: limit the train split to the first N samples. Default is8(smoke). PassNoneto use the full Flickr8k train CSV (~6000 rows, ~1 GB jpgs, 10+ min cold download — use via CLIdataset.sample_count=nullonly when you intend a real run). max_packing_seqlen: max NTP-length tokens per pack (default 2048). seqlen_divisible_by: pad total NTP length up to this multiple (default 64). oversize_policy: “drop” or “extend” — what to do with a single sample whose NTP length already exceedsmax_packing_seqlen. dataset_sampling: “sequential” or “random” — in-domain order. cache_dir: local cache for the Flickr8k download. prompt: user prompt prefixed to every assistant-caption pair.
- bridge.recipes.stepfun.step37.step37_sft_flickr8k_smoke_config(
- hf_path: str = 'stepfun-ai/Step-3.7-Flash',
- *,
- sample_count: int = 8,
- max_packing_seqlen: int = 2048,
- fixed_pack_idx: int = 0,
- train_iters: int = 100,
- max_lr: float = 0.005,
- cache_dir: str = '.cache/step37_flickr8k_smoke',
Smoke variant of :func:
step37_flickr8k_sft_config— the same packed sample on every DP rank, every step. Deterministic and tiny: it repeats pack[fixed_pack_idx] indefinitely so the loss curve visibly drops as the model overfits a single batch.Differences vs. the regular SFT config:
dataset.fixed_pack_idxpins__getitem__→ identical input across every DP rank and every iteration.dataset.dataset_sampling = "sequential"for reproducibility.max_lrbumped 5e-6 → 5e-3 so the overfit happens withintrain_iterssteps.Language model unfrozen (the regular config freezes it); vision tower stays frozen (overfitting on the projector + LM is enough and avoids the PE-G/14 backward cost).
log_interval=1, eval disabled, no mid-run checkpoint save.
Kwargs: hf_path: HF model id or local path to the Step3.7 checkpoint (default
stepfun-ai/Step-3.7-Flash). sample_count: tiny train slice (default8); raise only if pack 0 is unrepresentative. max_packing_seqlen: max NTP-length tokens per pack. fixed_pack_idx: which pack to repeat (default0). train_iters: number of smoke iterations (default100). max_lr: peak LR for the cosine schedule (default5e-3). cache_dir: separate cache so the smoke download doesn’t shadow the full-Flickr8k cache.
- bridge.recipes.stepfun.step37.__all__#
[‘step37_sft_flickr8k_config’, ‘step37_sft_flickr8k_smoke_config’]