bridge.diffusion.conversion.nemotron_labs_diffusion.nemotron_labs_diffusion_bridge#
Megatron Bridge for NemotronLabsDiffusion diffusion language models.
Converts between HuggingFace and Megatron-Core GPTModel format, using NemotronLabsDiffusionModelProvider which replaces core attention with NemotronLabsDiffusionAttention for sbd_block_diff.
Supports two HF checkpoint formats (auto-detected from config):
Text-only (NemotronLabsDiffusion): encoder.*, diffusion_head.weight
VLM source (Ministral CPT): language_model.model.*, language_model.lm_head.weight (vision_tower and multi_modal_projector weights are ignored)
Module Contents#
Classes#
HF <-> Megatron bridge for NemotronLabsDiffusion diffusion language models. |
API#
- class bridge.diffusion.conversion.nemotron_labs_diffusion.nemotron_labs_diffusion_bridge.NemotronLabsDiffusionBridge#
Bases:
megatron.bridge.models.conversion.model_bridge.MegatronModelBridgeHF <-> Megatron bridge for NemotronLabsDiffusion diffusion language models.
Handles both text-only (encoder.) and VLM (language_model.model.) HF formats. The format is auto-detected in provider_bridge() and used in mapping_registry().
The Megatron target is a bare GPTModel (not wrapped in Ministral3Model), so Megatron-side keys use embedding., decoder., output_layer.* (no language_model. prefix).
- _is_text_only: bool#
True
- provider_bridge(
- hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM,
- _text_only_mappings() list#
Mappings for text-only NemotronLabsDiffusion checkpoints (encoder.*, diffusion_head.weight).
- _vlm_mappings() list#
Mappings for VLM Ministral CPT source checkpoints (language_model.model.*).
Vision keys (vision_tower., multi_modal_projector.) are absent from the Megatron GPTModel side and are naturally ignored.
- mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#