nemo_automodel.components.models.deepseek_v4.mtp
nemo_automodel.components.models.deepseek_v4.mtp
DeepSeek V4 Multi-Token Prediction (MTP) blocks.
The released DSV4-Flash checkpoint stores MTP under mtp.{depth}.*. Each
MTP depth mirrors the reference MTPBlock:
- fuse the future-token embedding and the backbone HC stream with
e_proj(embed) + h_proj(hidden); - run one HC-enabled DSV4 attention + MoE block;
- collapse the HC stream with an MTP-local
hc_headandnormbefore the shared LM head computes the auxiliary CE loss.
Module Contents
Classes
Functions
API
Bases: Module
One DSV4 MTP depth.
Parameters:
Main DSV4 config.
Global layer index used by the attention implementation.
Shared MoE config.
BackendConfig for kernels/modules.
Model dtype.
Shared main rotary embedding module.
Shared compressor rotary embedding module.
Run one MTP depth.
Parameters:
HC stream [B, S, hc_mult, H].
Future-token embeddings [B, S, H].
Returns: torch.Tensor
Tuple of (next_hc_stream, prediction_hidden) where
Bases: Module
DSV4 MTP stack, one :class:DeepseekV4MTPBlock per prediction depth.
Construct DSV4 MTP blocks.
Build an MTPConfig from a DeepseekV4Config.