nemo_automodel.components.speculative.eagle.draft_llama_v12
nemo_automodel.components.speculative.eagle.draft_llama_v12
Llama-style dense LLM draft model for EAGLE-1 / EAGLE-2 training.
Config-driven; supports Llama, Phi-3, and Qwen3 dense via standard HF config
fields (attention_bias, mlp_bias, rope_theta/rope_scaling,
rms_norm_eps). Class names are retained for checkpoint-architectures
compatibility.
Module Contents
Classes
Functions
API
Bases: Module
Standard Llama-style self attention for the EAGLE-1/2 draft.
Bases: Module
Single decoder layer for the minimal EAGLE-1/2 draft model.
Bases: Module
Standard SwiGLU MLP used by the EAGLE-1/2 draft.
Bases: PreTrainedModel
Llama-style dense draft that predicts next-step hidden states.
Works with Llama, Phi-3, and Qwen3 dense configs. The class name is retained for backward compatibility with already-trained checkpoints.
Copy the target model token embeddings into the draft embeddings.
When the target is wrapped with FSDP2, its embed_tokens.weight is
a DTensor sharded across ranks. Gather to a local full tensor
before copying into the (unsharded) draft parameter — otherwise
aten.copy_ raises a mixed Tensor/DTensor error.
Freeze draft token embeddings.
Build a standard causal + padding mask for eager attention.