GPT-2#
GPT-2 is OpenAI’s foundational decoder-only transformer. NeMo AutoModel uses it as a baseline for the Megatron pretraining smoke test and tutorials — its small footprint makes it a convenient target to validate data pipelines, distributed configs, and logging without needing large compute.
Task |
Text Generation (pretraining baseline) |
Architecture |
|
Parameters |
124M – 1.5B |
HF Org |
Available Models#
gpt2 (124M)
gpt2-medium (355M)
gpt2-large (774M)
gpt2-xl (1.5B)
Architecture#
GPT2LMHeadModel
Example HF Models#
Model |
HF ID |
|---|---|
GPT-2 |
Example Recipes#
Recipe |
Description |
|---|---|
Megatron pretraining smoke test — GPT-2 on FineWeb-Edu |
Try with NeMo AutoModel#
1. Install (full instructions):
pip install nemo-automodel
2. Clone the repo to get the example recipes:
git clone https://github.com/NVIDIA-NeMo/Automodel.git
cd Automodel
3. Run the recipe from inside the repo:
automodel --nproc-per-node=8 examples/llm_pretrain/megatron_pretrain_gpt2.yaml
See the Installation Guide and LLM Pretraining Guide.