nemo_automodel.components.models.common.mtp#

Shared scaffolding for Multi-Token Prediction (MTP) auxiliary heads.

MTP follows the DeepSeek-V3 design (Liu et al., 2024). Each MTP ā€œdepthā€ predicts one additional future token; per depth the input is rolled left by one position, fused with the previous-depth hidden state, and passed through an inner block before producing logits via the shared LM head.

Components in this package are model-agnostic. Model-specific glue (building the inner block out of the model’s own decoder layers, wiring HF state-dict keys) lives in the model’s own package.

Submodules#

Package Contents#

Data#

API#

nemo_automodel.components.models.common.mtp.__all__#

[ā€˜MTPConfig’, ā€˜MTPModule’, ā€˜get_mtp_loss_scaling_factor’, ā€˜roll_tensor’]