Recipe Patterns#
Recipes provide pre-configured ConfigContainer objects for training each model variant.
Reference implementations:
VLM:
src/megatron/bridge/recipes/qwen_vl/qwen35_vl.pyLLM:
src/megatron/bridge/recipes/gpt_oss/gpt_oss.py
File Structure#
src/megatron/bridge/recipes/<family>/
├── __init__.py # Import and expose recipe functions
└── <model>.py # Recipe functions for all sizes
Recipe Function Pattern#
Each model size gets dedicated functions for SFT, PEFT, and optionally pretrain:
def <model>_<size>_sft_config() -> ConfigContainer:
"""SFT config for <Model> <Size>."""
cfg = _sft_common() # or _sft_common_vlm() for VLMs
# Model
cfg.model = AutoBridge.from_hf_pretrained("<org>/<default-model>").to_megatron_provider(load_weights=False)
# Parallelism
cfg.model.tensor_model_parallel_size = 4
cfg.model.pipeline_model_parallel_size = 1
cfg.model.sequence_parallel = True
# Training
cfg.training.max_steps = 100
cfg.training.global_batch_size = 128
cfg.training.micro_batch_size = 1
# Optimizer
cfg.optimizer.lr = 5e-6
cfg.optimizer.weight_decay = 0.01
# VLM-specific (if applicable)
cfg.model.freeze_language_model = False
cfg.model.freeze_vision_model = False
cfg.model.freeze_vision_projection = False
return cfg
def <model>_<size>_peft_config(peft_scheme: str | PEFT = "lora") -> ConfigContainer:
"""PEFT config for <Model> <Size>."""
cfg = _peft_common() # or _peft_common_vlm() for VLMs
cfg.model = AutoBridge.from_hf_pretrained("<org>/<default-model>").to_megatron_provider(load_weights=False)
# PEFT typically uses smaller parallelism
cfg.model.tensor_model_parallel_size = 1
cfg.model.pipeline_model_parallel_size = 1
# PEFT uses higher LR
cfg.optimizer.lr = 2e-4
# PEFT config
peft_cfg = default_peft_config(peft_scheme)
cfg.peft = peft_cfg
return cfg
Common Base Functions#
Function |
Use Case |
|---|---|
|
LLM pretraining |
|
LLM supervised fine-tuning |
|
LLM parameter-efficient fine-tuning |
|
VLM SFT (adds vision dataset, null tokenizer) |
|
VLM PEFT |
VLM variants additionally set:
cfg.datasettoHFDatasetConversationProvider(e.g., CORD-v2)cfg.dataset.hf_processor_pathfor the vision processorNullTokenizer(tokenization handled by processor)DDP without overlap (for vision model compatibility)
Parallelism Guidelines#
Constraint: max(TP*CP, EP) * PP = minimum GPUs, with 8 GPUs per node.
Model Size |
TP |
PP |
EP |
CP |
Notes |
|---|---|---|---|---|---|
< 3B |
1 |
1 |
1 |
1 |
Single GPU |
3-8B |
2 |
1 |
1 |
1 |
|
8-13B |
4 |
1 |
1 |
1 |
|
13-70B |
4 |
4 |
1 |
1 |
|
MoE (any) |
1-2 |
1-4 |
8-32 |
1 |
EP dominates |
Rules:
TP must be <=
num_key_value_headsWhen EP > 1 and TP > 1,
sequence_parallelmust be TruePEFT typically uses smaller parallelism (TP=1, PP=1)
Export / Registration#
Family __init__.py#
from megatron.bridge.recipes.<family>.<model> import (
<model>_<size1>_sft_config,
<model>_<size1>_peft_config,
<model>_<size2>_sft_config,
<model>_<size2>_peft_config,
)
__all__ = [
"<model>_<size1>_sft_config",
"<model>_<size1>_peft_config",
# ...
]
Top-level recipes/__init__.py#
Add a wildcard import:
from megatron.bridge.recipes.<family> import *
train_any_basic.py#
Add entry to config_map dict, docstring model list, and --model argparse choices.
Recipe Test Patterns#
Unit test (no GPU)#
Monkeypatch AutoBridge to return a mock provider. Verify ConfigContainer structure:
def test_sft_config(monkeypatch):
monkeypatch.setattr("megatron.bridge.AutoBridge.from_hf_pretrained", mock_bridge)
cfg = model_size_sft_config()
assert cfg.model.tensor_model_parallel_size == 4
assert cfg.training.global_batch_size == 128
Functional test (GPU)#
Use run_pretrain_vl_recipe_test() from tests/functional_tests/recipes/utils.py:
RECIPES = [
(model_size_sft_config, "model_size_sft", {}, {}),
]
PEFT_RECIPES = [
(partial(model_size_peft_config, peft="lora"), "model_size_peft", {}, {}),
]
Five training scenarios to cover (VLMs)#
SFT nothing frozen
SFT language frozen (train vision + projection)
SFT vision + language frozen (train projection only)
PEFT with vision frozen
PEFT with nothing frozen