peft/megatron_bridge#
This step trains a low-rank adaptation (LoRA) adapter on top of a Megatron checkpoint by using NVIDIA Megatron-Bridge.
Use this step when a full supervised fine-tune does not fit in memory at the target model size, but you still need tensor and pipeline parallelism.
The step consumes packed Apache Parquet shards produced by data_prep/sft_packing together with a base Megatron checkpoint, and produces a checkpoint_lora artifact.
Syntax#
nemotron steps run peft/megatron_bridge \
[-c <config-name-or-path>] \
[-r <run-profile> | -b <batch-profile>] \
[-d] \
[--force-squash] \
[<dotlist-overrides>...] \
[<passthrough-args>...]
Refer to the Nemotron Steps CLI Reference for the shared flag set.
Configuration Files#
The step ships two configuration files under src/nemotron/steps/peft/megatron_bridge/config/.
File |
Purpose |
|---|---|
|
Full-shape LoRA tuning on top of the Nano3 Megatron-Bridge finetune recipe with rank thirty-two adapters on |
|
Short validation run against packed Parquet shards. |
Pass the configuration name with -c:
$ nemotron steps run peft/megatron_bridge -c tiny
$ nemotron steps run peft/megatron_bridge -c default
Inputs and Outputs#
Direction |
Artifact Type |
Required |
Description |
|---|---|---|---|
Consumes |
|
Yes |
Packed Parquet shards with |
Consumes |
|
Yes |
A pretrained Megatron checkpoint to adapt. |
Produces |
|
— |
LoRA adapter weights. Merge the adapter with the base checkpoint by using |
Step Parameters#
The manifest declares two LoRA-specific parameters. Pass them as dotlist overrides.
- peft.type=<scheme>#
The parameter-efficient training scheme. Only
lorais supported today.Choices:
lora.Default:
lora.Example:
peft.type=lora
- peft.dim=<n>#
The LoRA rank.
Default:
32.Example:
peft.dim=16
Frequently used dotlist overrides drawn from the Nano3 finetune recipe include the following.
- recipe.seq_length=<n>#
The training sequence length applied by the recipe. This value must match the
pack_sizeyou used indata_prep/sft_packing.Example:
recipe.seq_length=8192
- recipe.tensor_model_parallel_size=<n>#
The tensor-model-parallel degree applied by the Nano3 finetune recipe.
Example:
recipe.tensor_model_parallel_size=8
- recipe.pipeline_model_parallel_size=<n>#
The pipeline-model-parallel degree applied by the Nano3 finetune recipe.
Example:
recipe.pipeline_model_parallel_size=4
- train.train_iters=<n>#
The number of training iterations.
Example:
train.train_iters=2000
- train.global_batch_size=<n>#
The global batch size for the training loop.
Example:
train.global_batch_size=32
- dataset.nano3_packed_sft_dir=<path>#
The directory that contains packed Parquet shards from
data_prep/sft_packing. The default configuration reads this value from theSFT_PACKED_DIRenvironment variable.Example:
dataset.nano3_packed_sft_dir=/lustre/packed/super3-sft
Strategies#
The manifest records one operator strategy for peft/megatron_bridge.
When a full supervised fine-tune does not fit in memory at the desired model size, switch to
peft/megatron_bridgeto keep tensor and pipeline parallelism while reducing the trainable parameter count.
Common Errors#
- missing_packed_data#
Cause: the training loop cannot find packed Parquet shards at the configured
dataset.nano3_packed_sft_dir.Recovery: run
data_prep/sft_packingfirst, or overridedataset.nano3_packed_sft_dirto point at the directory that holds the packed splits.
Command Examples#
Run the tiny validation configuration locally:
$ nemotron steps run peft/megatron_bridge -c tiny
Compile the default configuration without submitting the job:
$ nemotron steps run peft/megatron_bridge -c default --dry-run
Submit an attached LoRA run on a Lepton profile with a longer sequence length:
$ nemotron steps run peft/megatron_bridge -c default -r lepton_peft_megatron_bridge \
peft.dim=16 \
recipe.seq_length=8192 \
train.train_iters=2000
Submit a detached LoRA run on a Slurm profile with eight-way tensor parallelism:
$ nemotron steps run peft/megatron_bridge -c default -b slurm_peft_megatron_bridge \
peft.dim=32 \
recipe.tensor_model_parallel_size=8 \
train.global_batch_size=64