Model Optimization Steps#

This section documents the model optimization steps registered under src/nemotron/steps/optimize/modelopt/. The three steps wrap NVIDIA Model Optimizer through NVIDIA Megatron-Bridge to quantize, prune, and distill trained checkpoints.

Steps#

optimize/modelopt/quantize

Post-training quantization with floating-point recipes such as fp8 and nvfp4, and integer recipes such as int8_sq and int4_awq.

optimize/modelopt/quantize
optimize/modelopt/prune

Structured pruning by target parameter budget or explicit architecture.

optimize/modelopt/prune
optimize/modelopt/distill

Teacher-student distillation that recovers quality after pruning or quantization.

optimize/modelopt/distill