bridge.models.llama_nemotron.llama_nemotron_provider#
Module Contents#
Classes#
Generic provider for heterogeneous (NAS) Llama-Nemotron models using DeciLMForCausalLM. |
Functions#
Determine the most appropriate layer specification based on availability. |
Data#
API#
- bridge.models.llama_nemotron.llama_nemotron_provider.logger#
‘getLogger(…)’
- bridge.models.llama_nemotron.llama_nemotron_provider.heterogeneous_layer_spec(
- config,
Determine the most appropriate layer specification based on availability.
Uses Transformer Engine specs since TE is a required dependency.
- Parameters:
config – GPT configuration object
- Returns:
The selected module specification
- Return type:
ModuleSpec
- class bridge.models.llama_nemotron.llama_nemotron_provider.LlamaNemotronHeterogeneousProvider#
Bases:
megatron.bridge.models.gpt_provider.GPTModelProvider,megatron.bridge.models.transformer_config.HeterogeneousTransformerConfigGeneric provider for heterogeneous (NAS) Llama-Nemotron models using DeciLMForCausalLM.
Sizes and all architectural details are driven directly from the HF config provided at runtime via kwargs (num_layers, hidden_size, heads, kv_channels, etc.).
- normalization: str#
‘RMSNorm’
- activation_func: Callable#
None
- gated_linear_unit: bool#
True
- position_embedding_type: str#
‘rope’
- add_bias_linear: bool#
False
0.0
- attention_dropout: float#
0.0
False
- bias_activation_fusion: bool#
True
- masked_softmax_fusion: bool#
True
- persist_layer_norm: bool#
True
- bias_dropout_fusion: bool#
True
- apply_rope_fusion: bool#
True
- rotary_percent: float#
1.0
- num_query_groups: int#
8
- init_method_std: float#
0.02
- bf16: bool#
True
- fp16: bool#
False
- params_dtype: torch.dtype#
None
- autocast_dtype: torch.dtype#
None
- heterogeneous_layers_config_path: str | None#
None
- heterogeneous_layers_config_encoded_json: str = <Multiline-String>#
- transformer_layer_spec: megatron.core.transformer.spec_utils.ModuleSpec | Callable#
None