Model Optimization Steps#
This section documents the model optimization steps registered under src/nemotron/steps/optimize/modelopt/.
The three steps wrap NVIDIA Model Optimizer through NVIDIA Megatron-Bridge to quantize, prune, and distill trained checkpoints.
Steps#
optimize/modelopt/quantize
Post-training quantization with floating-point recipes such as fp8 and nvfp4, and integer recipes such as int8_sq and int4_awq.
optimize/modelopt/prune
Structured pruning by target parameter budget or explicit architecture.
optimize/modelopt/distill
Teacher-student distillation that recovers quality after pruning or quantization.