NeMo Megatron#

Megatron [nlp-megatron1] is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA. NeMo Megatron supports several types of models:

  • GPT-style models (decoder only)

  • T5/BART/UL2-style models (encoder-decoder)

  • BERT-style models (encoder only)

  • RETRO model (decoder only)

Note

NeMo Megatron has an Enterprise edition which contains tools for data preprocessing, hyperparameter tuning, container, scripts for various clouds and more. With Enterprise edition you also get deployment tools. Apply for early access here .

References#

nlp-megatron1

Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. Megatron-lm: training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053, 2019.