bridge.diffusion.models.nemotron_labs_diffusion.nemotron_labs_diffusion_provider#
NemotronLabsDiffusion model provider: text-only GPTModel + NemotronLabsDiffusionAttention for sbd_block_diff.
Module Contents#
Classes#
Text-only diffusion LM with NemotronLabsDiffusionAttention (sbd_block_diff) for dLLM training. |
API#
- class bridge.diffusion.models.nemotron_labs_diffusion.nemotron_labs_diffusion_provider.NemotronLabsDiffusionModelProvider#
Bases:
megatron.bridge.models.Ministral3ModelProviderText-only diffusion LM with NemotronLabsDiffusionAttention (sbd_block_diff) for dLLM training.
- mask_token_id: int#
100
- dlm_paradigm: str#
‘sbd_block_diff’
- block_size: int#
64
- different_seed_per_dp: bool#
True
- apply_llama4_style_query_key_layer_scaling: bool#
True
- dlm_loss_weight: float#
0.3
- ar_loss_weight: float#
1.0
- position_embedding_type: str#
‘none’
- provide(pre_process=None, post_process=None, vp_stage=None)#