Large Language Models
To learn more about using NeMo to train Large Language Models at scale, please refer to the NeMo Framework User Guide.
GPT-style models (decoder only)
T5/BART/UL2-style models (encoder-decoder)
BERT-style models (encoder only)
RETRO model (decoder only)
References
- nlp-megatron1
Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. Megatron-lm: training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053, 2019.