Large Language Models

To learn more about using NeMo to train Large Language Models at scale, please refer to the NeMo Framework User Guide.

  • GPT-style models (decoder only)

  • T5/BART/UL2-style models (encoder-decoder)

  • BERT-style models (encoder only)

  • RETRO model (decoder only)



Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. Megatron-lm: training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053, 2019.