nemo_automodel.components.models.common.mtp

View as Markdown

Shared scaffolding for Multi-Token Prediction (MTP) auxiliary heads.

MTP follows the DeepSeek-V3 design (Liu et al., 2024). Each MTP “depth” predicts one additional future token; per depth the input is rolled left by one position, fused with the previous-depth hidden state, and passed through an inner block before producing logits via the shared LM head.

Components in this package are model-agnostic. Model-specific glue (building the inner block out of the model’s own decoder layers, wiring HF state-dict keys) lives in the model’s own package.

Submodules

Package Contents

Data

__all__

API

nemo_automodel.components.models.common.mtp.__all__ = ['MTPConfig', 'MTPModule', 'get_mtp_loss_scaling_factor', 'roll_tensor']