Fixes for the following issues will be released shortly:
CUDNN Flash attention disabled by default until further testing is done
For data preparation of GPT models, use your own dataset or an online dataset legally approved by your organization
Megatron Core gradient clipping issue when using MoE and EP
Race condition in NeMo experiment manager
Mistral & Mixtral tokenizers require Hugging Face login