bridge.diffusion.models.nemotron_labs_diffusion.nemotron_labs_diffusion_provider#

NemotronLabsDiffusion model provider: text-only GPTModel + NemotronLabsDiffusionAttention for sbd_block_diff.

Module Contents#

Classes#

NemotronLabsDiffusionModelProvider

Text-only diffusion LM with NemotronLabsDiffusionAttention (sbd_block_diff) for dLLM training.

API#

class bridge.diffusion.models.nemotron_labs_diffusion.nemotron_labs_diffusion_provider.NemotronLabsDiffusionModelProvider#

Bases: megatron.bridge.models.Ministral3ModelProvider

Text-only diffusion LM with NemotronLabsDiffusionAttention (sbd_block_diff) for dLLM training.

mask_token_id: int#

100

dlm_paradigm: str#

‘sbd_block_diff’

block_size: int#

64

different_seed_per_dp: bool#

True

apply_llama4_style_query_key_layer_scaling: bool#

True

dlm_loss_weight: float#

0.3

ar_loss_weight: float#

1.0

position_embedding_type: str#

‘none’

provide(pre_process=None, post_process=None, vp_stage=None)#