Large Language Models#

To learn more about using NeMo to train Large Language Models at scale, please refer to the NeMo Framework User Guide!.

  • GPT-style models (decoder only)

  • T5/BART/UL2-style models (encoder-decoder)

  • BERT-style models (encoder only)

  • RETRO model (decoder only)

References#

nlp-megatron1

Shouyuan Chen, Sherman Wong, Liangjian Chen, and Yuandong Tian. Extending context window of large language models via positional interpolation. 2023. arXiv:2306.15595.

nlp-megatron2

Ta-Chung Chi, Ting-Han Fan, Peter J. Ramadge, and Alexander I. Rudnicky. Kerple: kernelized relative positional embedding for length extrapolation. 2022. arXiv:2205.09921.

nlp-megatron3

Ta-Chung Chi, Ting-Han Fan, Alexander I. Rudnicky, and Peter J. Ramadge. Dissecting transformer length extrapolation via the lens of receptive field analysis. 2023. arXiv:2212.10356.

nlp-megatron4

Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. Flashattention: fast and memory-efficient exact attention with io-awareness. 2022. arXiv:2205.14135.

nlp-megatron5

Ofir Press, Noah A. Smith, and Mike Lewis. Train short, test long: attention with linear biases enables input length extrapolation. 2022. arXiv:2108.12409.

nlp-megatron6

Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. Self-attention with relative position representations. 2018. arXiv:1803.02155.

nlp-megatron7

Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. Megatron-lm: training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053, 2019.

nlp-megatron8

Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, and Yunfeng Liu. Roformer: enhanced transformer with rotary position embedding. 2022. arXiv:2104.09864.

nlp-megatron9

Yutao Sun, Li Dong, Barun Patra, Shuming Ma, Shaohan Huang, Alon Benhaim, Vishrav Chaudhary, Xia Song, and Furu Wei. A length-extrapolatable transformer. 2022. arXiv:2212.10554.

nlp-megatron10

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. 2023. arXiv:1706.03762.