Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
Known Issues#
Fixes for the following issues will be released shortly:
Mcore distributed optimizer currently is missing a memory capacity optimization so the model state memory usage at a small data parallel sizes will be higher. We will support the optimization in the next patch.
The overlap of data-parallel parameter AllGather with optimizer.step (
overlap_param_gather_with_optimizer=true
) does not work with distributed checkpoint. The support for distributed checkpoint will be available in the next public release.Support for converting models from NeMo 2.0 to 1.0 is not yet available. This support will be needed to Align models until NeMo Aligner supports 2.0 natively.
Transformer Engine changed the way metadata was stored in the checkpoint after v1.10, which can cause checkpoint incompatibilities if using a Transformer Engine version later than v1.10 and attempting to load a checkpoint trained with a Transformer Engine version earlier than v1.10. Errors of this form look similar to the following:
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/checkpoint/default_planner.py", line 315, in create_default_local_load_plan raise RuntimeError(f"Missing key in checkpoint state_dict: {fqn}.") RuntimeError: Missing key in checkpoint state_dict: model.decoder.layers.self_attention.core_attention._extra_state/shard_0_24.
To work around this issue, use
model.dist_ckpt_load_strictness=log_all
when working with Transformer Engine v1.10 or higher. You can find the Transformer Engine versions present in each NeMo container on the Software Component Versions page.For data preparation of GPT models, use your own dataset or an online dataset legally approved by your organization.
Race condition in NeMo experiment manager.
Mistral & Mixtral tokenizers require Hugging Face login.
Gemma, Starcoder, and Falcon 7B export to TRT-LLM only works with a single GPU and if the user wants to export, there is no descriptive error message shown to the user.
The following notebooks have functional issues and will be fixed in the next release
ASR_with_NeMo.ipynb
ASR_with_Subword_Tokenization.ipynb
AudioTranslationSample.ipynb
Megatron_Synthetic_Tabular_Data_Generation.ipynb
SpellMapper_English_ASR_Customization.ipynb
FastPitch_ChineseTTS_Training.ipynb
NeVA Tutorial.ipynb
Export
Export Llama70B vLLM has an out of memory issue. It requires more time for the root cause analysis.
Export vLLM will not support LoRA, P-tuning, and LoRA support will be added in the next release.
In-framework (PyTorch level) deployment with 8GPUs is giving an error and requires time to understand the reason behind it.
Multimodal - LITA tutorial issue: tutorials/multimodal/LITA_Tutorial.ipynb The data preparation part requires users to manually download the youmakeup dataset instead of using the provided script. - Additional argument exp_manager.checkpoint_callback_params.save_nemo_on_train_end=True should be added to Neva Notebook Pretraining Part to ensure e2e workflow.
ASR - Timestamps miss alignment with Fastconformer ASR models when using diarization with ASR decoder. Related Issue: #8438