nemo_automodel.components.models.common.mtp#
Shared scaffolding for Multi-Token Prediction (MTP) auxiliary heads.
MTP follows the DeepSeek-V3 design (Liu et al., 2024). Each MTP ādepthā predicts one additional future token; per depth the input is rolled left by one position, fused with the previous-depth hidden state, and passed through an inner block before producing logits via the shared LM head.
Components in this package are model-agnostic. Model-specific glue (building the inner block out of the modelās own decoder layers, wiring HF state-dict keys) lives in the modelās own package.
Submodules#
Package Contents#
Data#
API#
- nemo_automodel.components.models.common.mtp.__all__#
[āMTPConfigā, āMTPModuleā, āget_mtp_loss_scaling_factorā, āroll_tensorā]