GPT-2
GPT-2
GPT-2 is OpenAI’s foundational decoder-only transformer. NeMo AutoModel uses it as a baseline for the Megatron pretraining smoke test and tutorials — its small footprint makes it a convenient target to validate data pipelines, distributed configs, and logging without needing large compute.
Available Models
- gpt2 (124M)
- gpt2-medium (355M)
- gpt2-large (774M)
- gpt2-xl (1.5B)
Architecture
GPT2LMHeadModel
Example HF Models
Example Recipes
Try with NeMo AutoModel
1. Install (full instructions):
2. Clone the repo to get the example recipes:
3. Run the recipe from inside the repo:
See the Installation Guide and LLM Pretraining Guide.