bridge.recipes.gpt.vanilla_gpt#

Vanilla GPT recipe — a minimal baseline that mirrors Megatron-LM pretrain_gpt.py defaults.

Use this recipe for MLM <-> Bridge correlation testing. All architectural and training knobs are left at their Megatron-Core / pretrain_gpt.py defaults so that the only source of difference between the two frameworks is what you explicitly override on the CLI.

Example::

uv run python scripts/training/run_recipe.py \
    --recipe vanilla_gpt_pretrain_config \
    model.num_layers=2 model.hidden_size=256 model.num_attention_heads=4 \
    model.activation_func=silu model.gated_linear_unit=true \
    train.train_iters=10 train.global_batch_size=8 train.micro_batch_size=2

Module Contents#

Functions#

vanilla_gpt_pretrain_config

Minimal GPT pretrain config aligned with Megatron-LM pretrain_gpt.py defaults.

API#

bridge.recipes.gpt.vanilla_gpt.vanilla_gpt_pretrain_config() megatron.bridge.training.config.ConfigContainer#

Minimal GPT pretrain config aligned with Megatron-LM pretrain_gpt.py defaults.

The model provider uses bare GPTModelProvider defaults (LayerNorm, GeLU, learned_absolute position embeddings, etc.) so there are no hidden model-specific assumptions. Override anything you need via CLI, including model.activation_func=silu and model.gated_linear_unit=true for SwiGLU activation.

Returns:

ConfigContainer with Megatron-LM-compatible defaults.