bridge.models.llama_nemotron.llama_nemotron_provider#

Module Contents#

Classes#

Llama31NemotronNano8BProvider

Configuration class for the Llama3.1-Nemotron-Nano-8B model. Maps to: nvidia/Llama-3.1-Nemotron-Nano-8B-v1 Based on Llama31Config8B with kv_channels=128

Llama31Nemotron70BProvider

Configuration class for the Llama3.1-Nemotron-70B model. Maps to: nvidia/Llama-3.1-Nemotron-70B-Instruct-HF Based on Llama31Config70B with kv_channels=128

Llama33NemotronSuper49BProvider

Configuration class for the Llama3.3-Nemotron-Super-49B model. Maps to: nvidia/Llama-3_3-Nemotron-Super-49B-v1 Based on Llama31Config70B with heterogeneous architecture and kv_channels=128

Llama31NemotronUltra253BProvider

Configuration class for the Llama3.1-Nemotron-Ultra-253B model. Maps to: nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 Based on Llama31Config405B with heterogeneous architecture and kv_channels=128

LlamaNemotronHeterogeneousProvider

Generic provider for heterogeneous (NAS) Llama-Nemotron models using DeciLMForCausalLM.

Functions#

heterogeneous_layer_spec

Determine the most appropriate layer specification based on availability.

Data#

API#

bridge.models.llama_nemotron.llama_nemotron_provider.logger#

‘getLogger(…)’

bridge.models.llama_nemotron.llama_nemotron_provider.heterogeneous_layer_spec(
config,
) megatron.core.transformer.spec_utils.ModuleSpec#

Determine the most appropriate layer specification based on availability.

Uses Transformer Engine specs since TE is a required dependency.

Parameters:

config – GPT configuration object

Returns:

The selected module specification

Return type:

ModuleSpec

class bridge.models.llama_nemotron.llama_nemotron_provider.Llama31NemotronNano8BProvider#

Bases: megatron.bridge.models.llama.llama_provider.Llama31ModelProvider8B

Configuration class for the Llama3.1-Nemotron-Nano-8B model. Maps to: nvidia/Llama-3.1-Nemotron-Nano-8B-v1 Based on Llama31Config8B with kv_channels=128

kv_channels: int#

128

bf16: bool#

True

fp16: bool#

False

params_dtype: torch.dtype#

None

autocast_dtype: torch.dtype#

None

class bridge.models.llama_nemotron.llama_nemotron_provider.Llama31Nemotron70BProvider#

Bases: megatron.bridge.models.llama.llama_provider.Llama31ModelProvider70B

Configuration class for the Llama3.1-Nemotron-70B model. Maps to: nvidia/Llama-3.1-Nemotron-70B-Instruct-HF Based on Llama31Config70B with kv_channels=128

kv_channels: int#

128

bf16: bool#

True

fp16: bool#

False

params_dtype: torch.dtype#

None

autocast_dtype: torch.dtype#

None

class bridge.models.llama_nemotron.llama_nemotron_provider.Llama33NemotronSuper49BProvider#

Bases: megatron.bridge.models.llama.llama_provider.Llama31ModelProvider70B, megatron.bridge.models.transformer_config.HeterogeneousTransformerConfig

Configuration class for the Llama3.3-Nemotron-Super-49B model. Maps to: nvidia/Llama-3_3-Nemotron-Super-49B-v1 Based on Llama31Config70B with heterogeneous architecture and kv_channels=128

Developer Note: For MRO, Llama31ModelProvider70B must come first to ensure proper provider functionality, then HeterogeneousTransformerConfig for heterogeneous support.

hidden_size: int#

8192

num_attention_heads: int#

64

num_layers: int#

80

kv_channels: int#

128

bf16: bool#

True

fp16: bool#

False

params_dtype: torch.dtype#

None

autocast_dtype: torch.dtype#

None

heterogeneous_layers_config_path: str | None#

None

heterogeneous_layers_config_encoded_json: str = <Multiline-String>#
transformer_layer_spec: megatron.core.transformer.spec_utils.ModuleSpec | Callable#

None

class bridge.models.llama_nemotron.llama_nemotron_provider.Llama31NemotronUltra253BProvider#

Bases: megatron.bridge.models.llama.llama_provider.Llama31ModelProvider405B, megatron.bridge.models.transformer_config.HeterogeneousTransformerConfig

Configuration class for the Llama3.1-Nemotron-Ultra-253B model. Maps to: nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 Based on Llama31Config405B with heterogeneous architecture and kv_channels=128

Developer Note: For MRO, Llama31ModelProvider405B must come first to ensure proper provider functionality, then HeterogeneousTransformerConfig for heterogeneous support.

num_layers: int#

162

hidden_size: int#

16384

num_attention_heads: int#

128

kv_channels: int#

128

bf16: bool#

True

fp16: bool#

False

params_dtype: torch.dtype#

None

autocast_dtype: torch.dtype#

None

heterogeneous_layers_config_path: str | None#

None

heterogeneous_layers_config_encoded_json: str = <Multiline-String>#
transformer_layer_spec: megatron.core.transformer.spec_utils.ModuleSpec | Callable#

None

class bridge.models.llama_nemotron.llama_nemotron_provider.LlamaNemotronHeterogeneousProvider#

Bases: megatron.bridge.models.llama.llama_provider.Llama31ModelProvider, megatron.bridge.models.transformer_config.HeterogeneousTransformerConfig

Generic provider for heterogeneous (NAS) Llama-Nemotron models using DeciLMForCausalLM.

Sizes and all architectural details are driven directly from the HF config provided at runtime via kwargs (num_layers, hidden_size, heads, kv_channels, etc.).

bf16: bool#

True

fp16: bool#

False

params_dtype: torch.dtype#

None

autocast_dtype: torch.dtype#

None

heterogeneous_layers_config_path: str | None#

None

heterogeneous_layers_config_encoded_json: str = <Multiline-String>#
transformer_layer_spec: megatron.core.transformer.spec_utils.ModuleSpec | Callable#

None