nemo_automodel.components.models.llama.model
nemo_automodel.components.models.llama.model
Custom Llama model implementation for NeMo Automodel.
This module provides a self-contained Llama implementation following HuggingFace’s implementation. Uses separate q_proj/k_proj/v_proj and gate_proj/up_proj (HF-style).
Example (YAML):
Module Contents
Classes
Data
API
Bases: Module
Multi-headed attention from ‘Attention Is All You Need’ paper.
Uses separate q_proj / k_proj / v_proj — identical to the default HuggingFace Llama implementation.
Bases: GradientCheckpointingLayer
Single Llama decoder layer with RMSNorm, attention, and MLP.
Inherits from GradientCheckpointingLayer for efficient activation checkpointing.
Bases: HFCheckpointingMixin, LlamaPreTrainedModel
Llama model with causal language modeling head.
Forward pass returning CausalLMOutputWithPast.
Parameters:
(batch_size, seq_len)
Optional attention mask
Optional position indices
Optional cached key/values
Optional pre-computed embeddings
Optional labels for computing loss
Whether to use KV caching
Position in cache
Number of final logits to compute (0=all, N=last N tokens)
Returns: CausalLMOutputWithPast
CausalLMOutputWithPast with loss, logits, past_key_values
Bases: Module
SwiGLU MLP with separate gate_proj and up_proj — identical to HuggingFace default.
Bases: LlamaPreTrainedModel
Llama transformer model (embeddings + decoder layers + norm).
Bases: PreTrainedModel
An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.