Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Large Language Models#
To learn more about using NeMo to train Large Language Models at scale, please refer to the NeMo Framework User Guide.
GPT-style models (decoder only)
T5/BART/UL2-style models (encoder-decoder)
BERT-style models (encoder only)
RETRO model (decoder only)
References#
[nlp-megatron1]
Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. Megatron-lm: training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053, 2019.