> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.gpt2

GPT-2 model utility wrappers for NeMo Automodel.

The canonical way to instantiate a GPT-2 with custom sizes is to pass a
`transformers.GPT2Config` into `NeMoAutoModelForCausalLM.from_config`.  For
YAML-driven workflows, however, specifying the entire nested config can be
verbose.  This module provides a *single-level* builder function that exposes
the most common GPT-2 hyper-parameters directly.

Example (YAML):

```python
model:
  _target_: nemo_automodel.components.models.gpt2.build_gpt2_model
  n_layer: 24           # GPT-2 Medium
  n_embd: 1024
  n_head: 16
  vocab_size: 50257
  n_positions: 2048
```

## Module Contents

### Classes

| Name                                                                                | Description                                                    |
| ----------------------------------------------------------------------------------- | -------------------------------------------------------------- |
| [`CausalSelfAttention`](#nemo_automodel-components-models-gpt2-CausalSelfAttention) | Multi-head self-attention with a causal mask.                  |
| [`GPT2LMHeadModel`](#nemo_automodel-components-models-gpt2-GPT2LMHeadModel)         | Minimal GPT-2 Causal-LM with tied input/output embeddings.     |
| [`MLP`](#nemo_automodel-components-models-gpt2-MLP)                                 | GPT-2 feed-forward network (GEGLU → Linear).                   |
| [`TransformerBlock`](#nemo_automodel-components-models-gpt2-TransformerBlock)       | A single transformer block (LN → Attn → Add → LN → MLP → Add). |

### Functions

| Name                                                                          | Description                                                   |
| ----------------------------------------------------------------------------- | ------------------------------------------------------------- |
| [`build_gpt2_model`](#nemo_automodel-components-models-gpt2-build_gpt2_model) | Instantiate and return a *pure-PyTorch* GPT-2 language model. |

### Data

[`__all__`](#nemo_automodel-components-models-gpt2-__all__)

### API

```python
class nemo_automodel.components.models.gpt2.CausalSelfAttention(
    embed_dim: int,
    num_heads: int,
    attn_dropout: float = 0.0
)
```

**Bases:** `Module`

Multi-head self-attention with a causal mask.

```python
nemo_automodel.components.models.gpt2.CausalSelfAttention.forward(
    x: torch.Tensor
) -> torch.Tensor
```

```python
class nemo_automodel.components.models.gpt2.GPT2LMHeadModel(
    vocab_size: int,
    n_positions: int,
    n_embd: int,
    n_layer: int,
    n_head: int,
    dropout: float = 0.1
)
```

**Bases:** `Module`

Minimal GPT-2 Causal-LM with tied input/output embeddings.

```python
nemo_automodel.components.models.gpt2.GPT2LMHeadModel._init_weights()
```

Parameter initialization following GPT-2 conventions.

```python
nemo_automodel.components.models.gpt2.GPT2LMHeadModel.forward(
    input_ids: torch.LongTensor,
    kwargs = {}
) -> torch.Tensor
```

```python
nemo_automodel.components.models.gpt2.GPT2LMHeadModel.initialize_weights()
```

```python
class nemo_automodel.components.models.gpt2.MLP(
    embed_dim: int,
    expansion_factor: int = 4
)
```

**Bases:** `Module`

GPT-2 feed-forward network (GEGLU → Linear).

```python
nemo_automodel.components.models.gpt2.MLP.forward(
    x: torch.Tensor
) -> torch.Tensor
```

```python
class nemo_automodel.components.models.gpt2.TransformerBlock(
    embed_dim: int,
    num_heads: int,
    dropout: float = 0.0
)
```

**Bases:** `Module`

A single transformer block (LN → Attn → Add → LN → MLP → Add).

```python
nemo_automodel.components.models.gpt2.TransformerBlock.forward(
    x: torch.Tensor
) -> torch.Tensor
```

```python
nemo_automodel.components.models.gpt2.build_gpt2_model(
    vocab_size: int = 50257,
    n_positions: int = 2048,
    n_ctx: int | None = None,
    n_embd: int = 768,
    n_layer: int = 12,
    n_head: int = 12,
    bos_token_id: int = 50256,
    eos_token_id: int = 50256,
    attn_implementation: str = 'flash_attention_2',
    extra_cfg: typing.Any = {}
) -> torch.nn.Module
```

Instantiate and return a *pure-PyTorch* GPT-2 language model.

The function intentionally keeps the same signature as the original
wrapper so existing YAML/CLI configurations continue to work.
Extra keyword arguments are quietly ignored.

```python
nemo_automodel.components.models.gpt2.__all__ = ['build_gpt2_model']
```