Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Known Issues

Fixes for the following issues will be released shortly:

  • To handle model structure updates when loading checkpoints trained with Transformer Engine versions earlier than v1.10, use model.dist_ckpt_load_strictness=log_all when working with Transformer Engine v1.10 or higher.

  • For data preparation of GPT models, use your own dataset or an online dataset legally approved by your organization

  • Race condition in NeMo experiment manager

  • Mistral & Mixtral tokenizers require Hugging Face login

  • Gemma, Starcoder, and Falcon 7B export to TRT-LLM only works with a single GPU and if the user wants to export, there is no descriptive error message shown to the user.

  • The following notebooks have functional issues and will be fixed in the next release

    • ASR_with_NeMo.ipynb

    • ASR_with_Subword_Tokenization.ipynb

    • AudioTranslationSample.ipynb

    • Megatron_Synthetic_Tabular_Data_Generation.ipynb

    • SpellMapper_English_ASR_Customization.ipynb

    • FastPitch_ChineseTTS_Training.ipynb

    • NeVA Tutorial.ipynb

  • Export

    • Export Llama70B vLLM has an out of memory issue. It requires more time for the root cause analysis.

    • Export vLLM will not support LoRA, P-tuning, and LoRA support will be added in the next release.

    • In-framework (PyTorch level) deployment with 8GPUs is giving an error and requires time to understand the reason behind it.

  • Multimodal - LITA tutorial issue: tutorials/multimodal/LITA_Tutorial.ipynb The data preparation part requires users to manually download the youmakeup dataset instead of using the provided script. - Additional argument exp_manager.checkpoint_callback_params.save_nemo_on_train_end=True should be added to Neva Notebook Pretraining Part to ensure e2e workflow.

  • ASR - Timestamps miss alignment with Fastconformer ASR models when using diarization with ASR decoder. Related Issue: #8438