bridge.models.llama_nemotron.llama_nemotron_bridge#

Module Contents#

Classes#

LlamaNemotronBridge

Megatron Bridge for Heterogeneous Llama-Nemotron models (Super/Ultra).

API#

class bridge.models.llama_nemotron.llama_nemotron_bridge.LlamaNemotronBridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for Heterogeneous Llama-Nemotron models (Super/Ultra).

This bridge handles heterogeneous Llama-Nemotron models that use the DeciLMForCausalLM architecture with block_configs for heterogeneous layer specifications. These models require special handling because:

  1. They use custom modeling code (DeciLMForCausalLM) loaded via auto_map

  2. They have heterogeneous block configurations (different layers have different specs)

  3. They require trust_remote_code=True to load from HuggingFace

Supported models (examples):

  • nvidia/Llama-3_3-Nemotron-Super-49B-v1 (80 layers, 8192 hidden)

  • nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 (updated v1.5 release)

  • nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 (162 layers, 16384 hidden)

Homogeneous Llama-Nemotron models (Nano/70B) use standard LlamaForCausalLM architecture and are handled by the regular LlamaBridge.

.. rubric:: Example

from megatron.bridge import AutoBridge

DeciLMForCausalLM models will automatically use this bridge

bridge = AutoBridge.from_hf_pretrained( … “nvidia/Llama-3_3-Nemotron-Super-49B-v1_5”, … trust_remote_code=True … ) provider = bridge.to_megatron_provider()

provider_bridge(
hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM,
) megatron.bridge.models.llama.llama_provider.Llama31ModelProvider#
mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#