nemo_automodel.components.models.common.mtp
nemo_automodel.components.models.common.mtp
Shared scaffolding for Multi-Token Prediction (MTP) auxiliary heads.
MTP follows the DeepSeek-V3 design (Liu et al., 2024). Each MTP “depth” predicts one additional future token; per depth the input is rolled left by one position, fused with the previous-depth hidden state, and passed through an inner block before producing logits via the shared LM head.
Components in this package are model-agnostic. Model-specific glue (building the inner block out of the model’s own decoder layers, wiring HF state-dict keys) lives in the model’s own package.