Large Language Models

To learn more about using NeMo to train Large Language Models at scale, please refer to the NeMo Framework User Guide.

  • GPT-style models (decoder only)

  • T5/BART/UL2-style models (encoder-decoder)

  • BERT-style models (encoder only)

  • RETRO model (decoder only)

[nlp-megatron1]

Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. Megatron-lm: training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053, 2019.

Previous NeMo Collections
Next GPT model training
© Copyright 2023-2024, NVIDIA. Last updated on May 17, 2024.