GPT-2

View as Markdown

GPT-2 is OpenAI’s foundational decoder-only transformer. NeMo AutoModel uses it as a baseline for the Megatron pretraining smoke test and tutorials — its small footprint makes it a convenient target to validate data pipelines, distributed configs, and logging without needing large compute.

TaskText Generation (pretraining baseline)
ArchitectureGPT2LMHeadModel
Parameters124M – 1.5B
HF Orgopenai-community

Available Models

  • gpt2 (124M)
  • gpt2-medium (355M)
  • gpt2-large (774M)
  • gpt2-xl (1.5B)

Architecture

  • GPT2LMHeadModel

Example HF Models

ModelHF ID
GPT-2openai-community/gpt2

Example Recipes

RecipeDescription
megatron_pretrain_gpt2.yamlMegatron pretraining smoke test — GPT-2 on FineWeb-Edu

Try with NeMo AutoModel

1. Install (full instructions):

$pip install nemo-automodel

2. Clone the repo to get the example recipes:

$git clone https://github.com/NVIDIA-NeMo/Automodel.git
$cd Automodel

3. Run the recipe from inside the repo:

$automodel --nproc-per-node=8 examples/llm_pretrain/megatron_pretrain_gpt2.yaml

See the Installation Guide and LLM Pretraining Guide.

Hugging Face Model Cards