optimize/modelopt/quantize#
This step runs post-training quantization (PTQ) on a Hugging Face (HF) format checkpoint by using NVIDIA Model Optimizer through NVIDIA Megatron-Bridge.
The step supports every quantization recipe that the installed Megatron-Bridge quantization script accepts.
The step produces a quantized Megatron distributed checkpoint that you can export back to Hugging Face format with the upstream export.py script.
Syntax#
nemotron steps run optimize/modelopt/quantize \
[-c <config-name-or-path>] \
[-r <run-profile> | -b <batch-profile>] \
[-d] \
[--force-squash] \
[<dotlist-overrides>...] \
[<passthrough-args>...]
Refer to the Nemotron Steps CLI Reference for the shared flag set.
Configuration Files#
The step ships four configuration files under src/nemotron/steps/optimize/modelopt/quantize/config/.
File |
Purpose |
|---|---|
|
Generic post-training quantization configuration with |
|
FP8 quantization configuration tuned for Hopper-class hardware. |
|
NVFP4 quantization configuration tuned for Blackwell-class hardware. |
|
Short validation run that exercises the quantization pipeline. |
Pass the configuration name with -c:
$ nemotron steps run optimize/modelopt/quantize -c tiny
$ nemotron steps run optimize/modelopt/quantize -c default
$ nemotron steps run optimize/modelopt/quantize -c fp8
$ nemotron steps run optimize/modelopt/quantize -c nvfp4
Inputs and Outputs#
Direction |
Artifact Type |
Required |
Description |
|---|---|---|---|
Consumes |
|
Yes |
A Hugging Face base or aligned checkpoint to quantize. |
Produces |
|
— |
A quantized Megatron distributed checkpoint. Export the artifact to Hugging Face format with |
Step Parameters#
The manifest declares three quantization parameters. Pass them as dotlist overrides.
- args.export_quant_cfg=<recipe>#
The quantization recipe to apply. The supported recipes match the choices that the installed upstream
quantize.pyaccepts.Choices:
int8_sq,fp8,fp8_blockwise,int4_awq,w4a8_awq,nvfp4.Default:
fp8.Example:
args.export_quant_cfg=nvfp4
- args.calib_size=<n>#
The number of calibration samples used to determine quantization scales.
Default:
512.Example:
args.calib_size=1024
- extra_args=<list>#
Literal upstream arguments that the step forwards to the underlying quantization script. Use this parameter to pass newly added Model Optimizer flags that do not yet have a dedicated
args.*entry.Default:
[].Example:
extra_args=["--my_new_flag", "value"]
Frequently used dotlist overrides drawn from the default configuration include the following.
- args.hf_model_id=<id-or-path>#
The Hugging Face identifier or local path for the checkpoint to quantize.
Example:
args.hf_model_id=nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
- args.megatron_save_path=<path>#
The destination directory for the quantized Megatron distributed checkpoint.
Example:
args.megatron_save_path=/lustre/runs/quantized/nano3-fp8
Strategies#
The manifest records three operator strategies for optimize/modelopt/quantize.
When the target hardware is Hopper or H100, start from
config/fp8.yamland setargs.export_quant_cfg=fp8.When the target hardware is Blackwell or B200, start from
config/nvfp4.yamland setargs.export_quant_cfg=nvfp4.When you need a Hugging Face checkpoint, export the produced Megatron checkpoint by using
/opt/Megatron-Bridge/examples/quantization/export.pyafter the quantization run completes.
Command Examples#
Run the tiny validation configuration locally:
$ nemotron steps run optimize/modelopt/quantize -c tiny
Compile the FP8 configuration without submitting the job:
$ nemotron steps run optimize/modelopt/quantize -c fp8 --dry-run
Submit an attached FP8 run on a Lepton profile for the Nano3 base model:
$ nemotron steps run optimize/modelopt/quantize -c fp8 -r lepton_optimize_modelopt_quantize \
args.hf_model_id=nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16 \
args.calib_size=1024
Submit a detached NVFP4 run on a Slurm profile for the Super3 base model:
$ nemotron steps run optimize/modelopt/quantize -c nvfp4 -b slurm_optimize_modelopt_quantize \
args.hf_model_id=nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 \
args.megatron_save_path=/lustre/quantized/super3-nvfp4