Model Optimization Steps#

This section documents the model optimization steps registered under src/nemotron/steps/optimize/modelopt/. The three steps wrap NVIDIA Model Optimizer through NVIDIA Megatron-Bridge to quantize, prune, and distill trained checkpoints.

Steps#

optimize/modelopt/quantize

Post-training quantization with floating-point recipes such as fp8 and nvfp4, and integer recipes such as int8_sq and int4_awq.

reference

optimize/modelopt/quantize

optimize/modelopt/prune

Structured pruning by target parameter budget or explicit architecture.

reference

optimize/modelopt/prune

optimize/modelopt/distill

Teacher-student distillation that recovers quality after pruning or quantization.

reference

optimize/modelopt/distill

Model Optimization Steps#

Steps#

Related Documentation#