nemo_automodel.components.models.gpt2#

GPT-2 model utility wrappers for NeMo Automodel.

The canonical way to instantiate a GPT-2 with custom sizes is to pass a transformers.GPT2Config into NeMoAutoModelForCausalLM.from_config. For YAML-driven workflows, however, specifying the entire nested config can be verbose. This module provides a single-level builder function that exposes the most common GPT-2 hyper-parameters directly.

Example (YAML):

model:
  _target_: nemo_automodel.components.models.gpt2.build_gpt2_model
  n_layer: 24           # GPT-2 Medium
  n_embd: 1024
  n_head: 16
  vocab_size: 50257
  n_positions: 2048

Module Contents#

Classes#

CausalSelfAttention

Multi-head self-attention with a causal mask.

MLP

GPT-2 feed-forward network (GEGLU β†’ Linear).

TransformerBlock

A single transformer block (LN β†’ Attn β†’ Add β†’ LN β†’ MLP β†’ Add).

GPT2LMHeadModel

Minimal GPT-2 Causal-LM with tied input/output embeddings.

Functions#

build_gpt2_model

Instantiate and return a pure-PyTorch GPT-2 language model.

Data#

API#

nemo_automodel.components.models.gpt2.__all__#

[β€˜build_gpt2_model’]

class nemo_automodel.components.models.gpt2.CausalSelfAttention(
embed_dim: int,
num_heads: int,
attn_dropout: float = 0.0,
)#

Bases: torch.nn.Module

Multi-head self-attention with a causal mask.

Initialization

forward(x: torch.Tensor) torch.Tensor#
class nemo_automodel.components.models.gpt2.MLP(embed_dim: int, expansion_factor: int = 4)#

Bases: torch.nn.Module

GPT-2 feed-forward network (GEGLU β†’ Linear).

Initialization

forward(x: torch.Tensor) torch.Tensor#
class nemo_automodel.components.models.gpt2.TransformerBlock(embed_dim: int, num_heads: int, dropout: float = 0.0)#

Bases: torch.nn.Module

A single transformer block (LN β†’ Attn β†’ Add β†’ LN β†’ MLP β†’ Add).

Initialization

forward(x: torch.Tensor) torch.Tensor#
class nemo_automodel.components.models.gpt2.GPT2LMHeadModel(
*,
vocab_size: int,
n_positions: int,
n_embd: int,
n_layer: int,
n_head: int,
dropout: float = 0.1,
)#

Bases: torch.nn.Module

Minimal GPT-2 Causal-LM with tied input/output embeddings.

Initialization

forward(input_ids: torch.LongTensor) torch.Tensor#
_init_weights()#

Parameter initialization following GPT-2 conventions.

nemo_automodel.components.models.gpt2.build_gpt2_model(
*,
vocab_size: int = 50257,
n_positions: int = 2048,
n_ctx: int | None = None,
n_embd: int = 768,
n_layer: int = 12,
n_head: int = 12,
bos_token_id: int = 50256,
eos_token_id: int = 50256,
attn_implementation: str = 'flash_attention_2',
**extra_cfg: Any,
) torch.nn.Module#

Instantiate and return a pure-PyTorch GPT-2 language model.

The function intentionally keeps the same signature as the original wrapper so existing YAML/CLI configurations continue to work. Extra keyword arguments are quietly ignored.