Megatron Core User Guide

Optimizer CPU offload package

Add these flags to enable optimizer cpu offload in MCore.

Copy
Copied!
            

--optimizer-cpu-offload --optimizer-offload-fraction 1.0 --use-precision-aware-optimizer

Gradient copy from GPU to CPU, CPU optimizer step, and subsequent parameter copy from CPU to GPU can be time-consuming operations, and it is recommended to use the flag --overlap-cpu-optimizer-d2h-h2d to execute them concurrently.

Previous Optimizer Parameters Scheduler
Next Multi-Token Prediction (MTP)
© Copyright 2022-2025, NVIDIA. Last updated on Sep 16, 2025.