Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
Large Language Models#
To learn more about using NeMo to train Large Language Models at scale, please refer to the NeMo Framework User Guide.
GPT-style models (decoder only)
T5/BART/UL2-style models (encoder-decoder)
BERT-style models (encoder only)
RETRO model (decoder only)
- GPT Model Training
- Batching
- RETRO Model
- Hiddens Module
- Parameter-Efficient Fine-Tuning (PEFT)
- Positional embeddings
- Positional interpolation
- References
- Megatron Core Customization
- Drop Model Laeyrs
- Validate Trimmed Model
- Validate Original Model
- Reset Learning Rate
- Parameters
- Use Cases
- Ramp Up Batch Size
- Usage
- Ramp Up Stages and Training Interruption
- Automatic Node Scheduling
- Example
References#
Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. Megatron-lm: training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053, 2019.