Important

You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.

Known Issues#

Fixes for the following issues will be released shortly:

  • Mcore distributed optimizer currently is missing a memory capacity optimization so the model state memory usage at a small data parallel sizes will be higher. We will support the optimization in the next patch.

  • The overlap of data-parallel parameter AllGather with optimizer.step (overlap_param_gather_with_optimizer=true) does not work with distributed checkpoint. The support for distributed checkpoint will be available in the next public release.

  • Support for converting models from NeMo 2.0 to 1.0 is not yet available. This support will be needed to Align models until NeMo Aligner supports 2.0 natively.