bridge.recipes.gpt.vanilla_gpt#
Vanilla GPT recipe — a minimal baseline that mirrors Megatron-LM pretrain_gpt.py defaults.
Use this recipe for MLM <-> Bridge correlation testing. All architectural and training knobs are left at their Megatron-Core / pretrain_gpt.py defaults so that the only source of difference between the two frameworks is what you explicitly override on the CLI.
Example::
uv run python scripts/training/run_recipe.py \
--recipe vanilla_gpt_pretrain_config \
model.num_layers=2 model.hidden_size=256 model.num_attention_heads=4 \
model.activation_func=silu model.gated_linear_unit=true \
train.train_iters=10 train.global_batch_size=8 train.micro_batch_size=2
Module Contents#
Functions#
Minimal GPT pretrain config aligned with Megatron-LM pretrain_gpt.py defaults. |
API#
- bridge.recipes.gpt.vanilla_gpt.vanilla_gpt_pretrain_config() megatron.bridge.training.config.ConfigContainer#
Minimal GPT pretrain config aligned with Megatron-LM pretrain_gpt.py defaults.
The model provider uses bare GPTModelProvider defaults (LayerNorm, GeLU, learned_absolute position embeddings, etc.) so there are no hidden model-specific assumptions. Override anything you need via CLI, including
model.activation_func=siluandmodel.gated_linear_unit=truefor SwiGLU activation.- Returns:
ConfigContainer with Megatron-LM-compatible defaults.