convert/megatron_to_hf#
This step converts a Megatron distributed checkpoint into Hugging Face safetensors layout.
Use it when a downstream Hugging Face-native consumer expects checkpoint_hf but the upstream artifact is checkpoint_megatron.
Syntax#
nemotron steps run convert/megatron_to_hf \
[-c <config-name-or-path>] \
[-r <run-profile> | -b <batch-profile>] \
[-d] \
[<dotlist-overrides>...]
Refer to the Nemotron Steps CLI Reference for the shared flag set.
Configuration Files#
The step ships config/default.yaml under src/nemotron/steps/convert/megatron_to_hf/config/.
The default hf_model_id is nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16, and megatron_path must be supplied for real runs.
Inputs and Outputs#
Direction |
Artifact Type |
Required |
Description |
|---|---|---|---|
Consumes |
|
Yes |
A specific Megatron checkpoint directory, normally an |
Produces |
|
- |
A Hugging Face safetensors checkpoint directory. |
Parameters#
- megatron_path=<path>#
Specific Megatron checkpoint directory to export. Point this at the concrete checkpoint iteration, not the parent training run folder.
Example:
megatron_path=/lustre/runs/sft/checkpoints/iter_0000100
- hf_model_id=<id-or-path>#
Hugging Face model id or config source used by AutoBridge to reconstruct the Hugging Face architecture and tokenizer expectations.
Example:
hf_model_id=nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16
- hf_path=<path>#
Output directory for the exported Hugging Face safetensors checkpoint.
Example:
hf_path=/lustre/output/convert/sft-hf
- trust_remote_code=<true-or-false>#
Whether to trust custom Hugging Face model code while reconstructing the export target.
Default:
true.
- show_progress=<true-or-false>#
Whether to show Megatron-Bridge export progress output.
Default:
true.
- strict=<true-or-false>#
Whether Megatron-Bridge should require source and target checkpoint keys to match strictly.
Default:
true.
Command Examples#
Export a validated Megatron checkpoint iteration to Hugging Face layout:
$ nemotron steps run convert/megatron_to_hf -c default \
megatron_path=/path/to/megatron/iter_0000100 \
hf_model_id=nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16 \
hf_path=./output/convert/sft-hf
Submit the export through a generated Lepton profile:
$ nemotron steps run convert/megatron_to_hf -c default --batch lepton_convert_model \
megatron_path=/mnt/lustre-shared/output/sft/megatron_bridge/iter_0000100 \
hf_model_id=nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16 \
hf_path=/mnt/lustre-shared/output/convert/sft-hf
Recovery Notes#
If export fails because the checkpoint is incomplete, wait for async checkpoint save to finish and retry from a complete
iter_*directory.If tokenizer or config reconstruction fails, set
hf_model_idto the original base model or config source.Validate the exported Hugging Face checkpoint with a small generation or evaluation job before deployment.