bridge.models.llama_nemotron.llama_nemotron_bridge
#
Module Contents#
Classes#
Megatron Bridge for Heterogeneous Llama-Nemotron models (Super/Ultra). |
API#
- class bridge.models.llama_nemotron.llama_nemotron_bridge.LlamaNemotronBridge#
Bases:
megatron.bridge.models.conversion.model_bridge.MegatronModelBridge
Megatron Bridge for Heterogeneous Llama-Nemotron models (Super/Ultra).
This bridge handles heterogeneous Llama-Nemotron models that use the DeciLMForCausalLM architecture with block_configs for heterogeneous layer specifications. These models require special handling because:
They use custom modeling code (DeciLMForCausalLM) loaded via auto_map
They have heterogeneous block configurations (different layers have different specs)
They require trust_remote_code=True to load from HuggingFace
Supported models (examples):
nvidia/Llama-3_3-Nemotron-Super-49B-v1 (80 layers, 8192 hidden)
nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 (updated v1.5 release)
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 (162 layers, 16384 hidden)
Homogeneous Llama-Nemotron models (Nano/70B) use standard LlamaForCausalLM architecture and are handled by the regular LlamaBridge.
.. rubric:: Example
from megatron.bridge import AutoBridge
DeciLMForCausalLM models will automatically use this bridge
bridge = AutoBridge.from_hf_pretrained( … “nvidia/Llama-3_3-Nemotron-Super-49B-v1_5”, … trust_remote_code=True … ) provider = bridge.to_megatron_provider()
- provider_bridge(
- hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM,
- mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry #