nemo_automodel.components.models.gpt2
#
GPT-2 model utility wrappers for NeMo Automodel.
The canonical way to instantiate a GPT-2 with custom sizes is to pass a
transformers.GPT2Config
into NeMoAutoModelForCausalLM.from_config
. For
YAML-driven workflows, however, specifying the entire nested config can be
verbose. This module provides a single-level builder function that exposes
the most common GPT-2 hyper-parameters directly.
Example (YAML):
model:
_target_: nemo_automodel.components.models.gpt2.build_gpt2_model
n_layer: 24 # GPT-2 Medium
n_embd: 1024
n_head: 16
vocab_size: 50257
n_positions: 2048
Module Contents#
Classes#
Multi-head self-attention with a causal mask. |
|
GPT-2 feed-forward network (GEGLU β Linear). |
|
A single transformer block (LN β Attn β Add β LN β MLP β Add). |
|
Minimal GPT-2 Causal-LM with tied input/output embeddings. |
Functions#
Instantiate and return a pure-PyTorch GPT-2 language model. |
Data#
API#
- nemo_automodel.components.models.gpt2.__all__#
[βbuild_gpt2_modelβ]
- class nemo_automodel.components.models.gpt2.CausalSelfAttention(
- embed_dim: int,
- num_heads: int,
- attn_dropout: float = 0.0,
Bases:
torch.nn.Module
Multi-head self-attention with a causal mask.
Initialization
- forward(x: torch.Tensor) torch.Tensor #
- class nemo_automodel.components.models.gpt2.MLP(embed_dim: int, expansion_factor: int = 4)#
Bases:
torch.nn.Module
GPT-2 feed-forward network (GEGLU β Linear).
Initialization
- forward(x: torch.Tensor) torch.Tensor #
- class nemo_automodel.components.models.gpt2.TransformerBlock(embed_dim: int, num_heads: int, dropout: float = 0.0)#
Bases:
torch.nn.Module
A single transformer block (LN β Attn β Add β LN β MLP β Add).
Initialization
- forward(x: torch.Tensor) torch.Tensor #
- class nemo_automodel.components.models.gpt2.GPT2LMHeadModel(
- *,
- vocab_size: int,
- n_positions: int,
- n_embd: int,
- n_layer: int,
- n_head: int,
- dropout: float = 0.1,
Bases:
torch.nn.Module
Minimal GPT-2 Causal-LM with tied input/output embeddings.
Initialization
- forward(input_ids: torch.LongTensor) torch.Tensor #
- _init_weights()#
Parameter initialization following GPT-2 conventions.
- nemo_automodel.components.models.gpt2.build_gpt2_model(
- *,
- vocab_size: int = 50257,
- n_positions: int = 2048,
- n_ctx: int | None = None,
- n_embd: int = 768,
- n_layer: int = 12,
- n_head: int = 12,
- bos_token_id: int = 50256,
- eos_token_id: int = 50256,
- attn_implementation: str = 'flash_attention_2',
- **extra_cfg: Any,
Instantiate and return a pure-PyTorch GPT-2 language model.
The function intentionally keeps the same signature as the original wrapper so existing YAML/CLI configurations continue to work. Extra keyword arguments are quietly ignored.