optimize/modelopt/prune#
This step runs structured pruning on a Hugging Face (HF) format checkpoint by using NVIDIA Model Optimizer through NVIDIA Megatron-Bridge.
The step supports pruning by a target parameter budget that Model Optimizer searches against, or pruning to an explicit architecture that you supply.
The step produces a pruned Hugging Face checkpoint that you can pass to optimize/modelopt/distill for quality recovery.
Syntax#
nemotron steps run optimize/modelopt/prune \
[-c <config-name-or-path>] \
[-r <run-profile> | -b <batch-profile>] \
[-d] \
[--force-squash] \
[<dotlist-overrides>...] \
[<passthrough-args>...]
Refer to the Nemotron Steps CLI Reference for the shared flag set.
Configuration Files#
The step ships two configuration files under src/nemotron/steps/optimize/modelopt/prune/config/.
File |
Purpose |
|---|---|
|
Generic structured-pruning configuration for |
|
Short validation run that exercises the pruning pipeline. |
Pass the configuration name with -c:
$ nemotron steps run optimize/modelopt/prune -c tiny
$ nemotron steps run optimize/modelopt/prune -c default
Inputs and Outputs#
Direction |
Artifact Type |
Required |
Description |
|---|---|---|---|
Consumes |
|
Yes |
A Hugging Face model identifier or checkpoint to prune. |
Produces |
|
— |
The pruned Hugging Face checkpoint. |
Step Parameters#
The manifest declares four pruning parameters. Pass them as dotlist overrides.
- args.prune_target_params=<float>#
The target parameter count for the Model Optimizer search. Use scientific notation for billions of parameters.
Default:
6e9.Example:
args.prune_target_params=4e9
- args.prune_export_config=<dict>#
The explicit target architecture for manual pruning, expressed as a dictionary that maps hyperparameter names such as
hidden_size,ffn_hidden_size, ornum_layersto integer values. Set this parameter when you want a specific architecture and you setargs.prune_target_params=null.Example:
args.prune_export_config='{"hidden_size":4096,"ffn_hidden_size":11008,"num_layers":24}'
- args.hparams_to_skip=<list>#
The architecture hyperparameters that the search must leave unchanged.
Example:
args.hparams_to_skip=["num_attention_heads"]
- extra_args=<list>#
Literal upstream arguments that the step forwards to the pruning script. Use this parameter to pass newly added Model Optimizer flags that do not yet have a dedicated
args.*entry.Default:
[].Example:
extra_args=["--num_layers_in_first_pipeline_stage", "4"]
Frequently used dotlist overrides drawn from the default configuration include the following.
- args.hf_model_name_or_path=<id-or-path>#
The Hugging Face identifier or local path for the checkpoint to prune.
Example:
args.hf_model_name_or_path=meta-llama/Llama-3.1-8B-Instruct
- args.output_hf_path=<path>#
The destination directory for the pruned Hugging Face checkpoint.
Example:
args.output_hf_path=/lustre/runs/pruned/llama-6b
- args.pp_size=<n>#
The pipeline-parallel degree applied during pruning.
Example:
args.pp_size=4
Strategies#
The manifest records three operator strategies for optimize/modelopt/prune.
When you know the target model budget, set
args.prune_target_paramsand leaveargs.prune_export_configunset so Model Optimizer searches candidate architectures.When you need a specific architecture, set
args.prune_export_configto the target dictionary and setargs.prune_target_params=null.When the layer count does not divide the pipeline-parallel size, set
args.num_layers_in_first_pipeline_stageandargs.num_layers_in_last_pipeline_stageto balance the partition.
Command Examples#
Run the tiny validation configuration locally:
$ nemotron steps run optimize/modelopt/prune -c tiny
Compile the default configuration without submitting the job:
$ nemotron steps run optimize/modelopt/prune -c default --dry-run
Submit an attached run on a Lepton profile that searches for a four-billion-parameter target:
$ nemotron steps run optimize/modelopt/prune -c default -r lepton_optimize_modelopt_prune \
args.hf_model_name_or_path=meta-llama/Llama-3.1-8B-Instruct \
args.prune_target_params=4e9 \
args.output_hf_path=/lustre/pruned/llama-4b
Submit a detached run on a Slurm profile that prunes to a manually specified architecture:
$ nemotron steps run optimize/modelopt/prune -c default -b slurm_optimize_modelopt_prune \
args.prune_target_params=null \
args.prune_export_config='{"hidden_size":4096,"ffn_hidden_size":11008,"num_layers":24}' \
args.pp_size=4