> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.speculative.eagle.draft_llama_v12

Llama-style dense LLM draft model for EAGLE-1 / EAGLE-2 training.

Config-driven; supports Llama, Phi-3, and Qwen3 dense via standard HF config
fields (`attention_bias`, `mlp_bias`, `rope_theta`/`rope_scaling`,
`rms_norm_eps`). Class names are retained for checkpoint-architectures
compatibility.

## Module Contents

### Classes

| Name                                                                                                            | Description                                                    |
| --------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------- |
| [`EagleLlamaAttention`](#nemo_automodel-components-speculative-eagle-draft_llama_v12-EagleLlamaAttention)       | Standard Llama-style self attention for the EAGLE-1/2 draft.   |
| [`EagleLlamaDecoderLayer`](#nemo_automodel-components-speculative-eagle-draft_llama_v12-EagleLlamaDecoderLayer) | Single decoder layer for the minimal EAGLE-1/2 draft model.    |
| [`EagleLlamaMLP`](#nemo_automodel-components-speculative-eagle-draft_llama_v12-EagleLlamaMLP)                   | Standard SwiGLU MLP used by the EAGLE-1/2 draft.               |
| [`LlamaEagleDraftModel`](#nemo_automodel-components-speculative-eagle-draft_llama_v12-LlamaEagleDraftModel)     | Llama-style dense draft that predicts next-step hidden states. |

### Functions

| Name                                                                                                    | Description                                                 |
| ------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------- |
| [`_build_causal_mask`](#nemo_automodel-components-speculative-eagle-draft_llama_v12-_build_causal_mask) | Build a standard causal + padding mask for eager attention. |

### API

```python
class nemo_automodel.components.speculative.eagle.draft_llama_v12.EagleLlamaAttention(
    config: transformers.PretrainedConfig
)
```

**Bases:** `Module`

Standard Llama-style self attention for the EAGLE-1/2 draft.

```python
nemo_automodel.components.speculative.eagle.draft_llama_v12.EagleLlamaAttention._repeat_kv(
    tensor: torch.Tensor
) -> torch.Tensor
```

```python
nemo_automodel.components.speculative.eagle.draft_llama_v12.EagleLlamaAttention.forward(
    hidden_states: torch.Tensor,
    attention_mask: torch.Tensor,
    position_ids: torch.Tensor
) -> torch.Tensor
```

```python
class nemo_automodel.components.speculative.eagle.draft_llama_v12.EagleLlamaDecoderLayer(
    config: transformers.PretrainedConfig
)
```

**Bases:** `Module`

Single decoder layer for the minimal EAGLE-1/2 draft model.

```python
nemo_automodel.components.speculative.eagle.draft_llama_v12.EagleLlamaDecoderLayer.forward(
    hidden_states: torch.Tensor,
    attention_mask: torch.Tensor,
    position_ids: torch.Tensor
) -> torch.Tensor
```

```python
class nemo_automodel.components.speculative.eagle.draft_llama_v12.EagleLlamaMLP(
    config: transformers.PretrainedConfig
)
```

**Bases:** `Module`

Standard SwiGLU MLP used by the EAGLE-1/2 draft.

```python
nemo_automodel.components.speculative.eagle.draft_llama_v12.EagleLlamaMLP.forward(
    hidden_states: torch.Tensor
) -> torch.Tensor
```

```python
class nemo_automodel.components.speculative.eagle.draft_llama_v12.LlamaEagleDraftModel(
    config: transformers.PretrainedConfig
)
```

**Bases:** `PreTrainedModel`

Llama-style dense draft that predicts next-step hidden states.

Works with Llama, Phi-3, and Qwen3 dense configs. The class name is
retained for backward compatibility with already-trained checkpoints.

```python
nemo_automodel.components.speculative.eagle.draft_llama_v12.LlamaEagleDraftModel.copy_embeddings_from_target(
    target_embeddings: torch.nn.Embedding
) -> None
```

Copy the target model token embeddings into the draft embeddings.

When the target is wrapped with FSDP2, its `embed_tokens.weight` is
a `DTensor` sharded across ranks.  Gather to a local full tensor
before copying into the (unsharded) draft parameter -- otherwise
`aten.copy_` raises a mixed Tensor/DTensor error.

```python
nemo_automodel.components.speculative.eagle.draft_llama_v12.LlamaEagleDraftModel.forward(
    input_ids: torch.Tensor,
    target_hidden_states: torch.Tensor,
    attention_mask: torch.Tensor
) -> torch.Tensor
```

```python
nemo_automodel.components.speculative.eagle.draft_llama_v12.LlamaEagleDraftModel.freeze_embeddings() -> None
```

Freeze draft token embeddings.

```python
nemo_automodel.components.speculative.eagle.draft_llama_v12._build_causal_mask(
    attention_mask: torch.Tensor,
    dtype: torch.dtype
) -> torch.Tensor
```

Build a standard causal + padding mask for eager attention.