bridge.models.mamba.mamba_provider
#
Module Contents#
Classes#
Configuration and provider for Megatron Core Mamba models. |
|
Configuration for a 130M parameter Mamba model. |
|
Configuration for a 370M parameter Mamba model. |
|
Configuration for a 780M parameter Mamba model. |
|
Configuration for a 1.3B parameter Mamba model. |
|
Configuration for a 2.7B parameter Mamba model. |
|
Configuration for a 8B parameter Mamba model used in NVIDIA research. |
|
Configuration for a 8B parameter hybrid Mamba model used in NVIDIA research. |
Data#
API#
- bridge.models.mamba.mamba_provider.logger#
‘getLogger(…)’
- class bridge.models.mamba.mamba_provider.MambaProvider#
Bases:
megatron.core.transformer.transformer_config.TransformerConfig
,megatron.bridge.models.model_provider.ModelProviderMixin
[megatron.core.models.mamba.MambaModel
]Configuration and provider for Megatron Core Mamba models.
This class extends TransformerConfig with Mamba-specific parameters and provides a method to instantiate configured Mamba models.
- fp16_lm_cross_entropy: bool#
False
- parallel_output: bool#
True
False
- params_dtype: torch.dtype#
None
- fp16: bool#
False
- bf16: bool#
True
- num_layers: int#
2
- mamba_num_groups: int#
8
- num_attention_heads: int#
1
- hybrid_attention_ratio: float#
0.0
- hybrid_mlp_ratio: float#
0.0
- hybrid_override_pattern: Optional[str]#
None
- seq_length: int#
8192
- position_embedding_type: Literal[learned_absolute, rope, none]#
‘none’
- rotary_percent: float#
1.0
- rotary_base: int#
10000
- seq_len_interpolation_factor: Optional[float]#
None
- apply_rope_fusion: bool#
True
- make_vocab_size_divisible_by: int#
128
- gated_linear_unit: bool#
False
- normalization: str#
‘RMSNorm’
- add_bias_linear: bool#
False
0.0
- attention_dropout: float#
0.0
- layernorm_epsilon: float#
1e-05
- attention_backend: megatron.core.transformer.enums.AttnBackend#
None
- deallocate_pipeline_outputs: bool#
True
- bias_dropout_fusion: bool#
True
- cross_entropy_loss_fusion: bool#
True
- mamba_stack_spec: Union[megatron.core.transformer.ModuleSpec, Callable[[], megatron.core.transformer.ModuleSpec]]#
‘field(…)’
- vocab_size: Optional[int]#
None
- provide(
- pre_process=None,
- post_process=None,
- vp_stage=None,
- tokenizer=None,
Configure and instantiate a Megatron Core Mamba model based on this configuration.
- Parameters:
pre_process – Whether to include pre-processing in the model, defaults to first pipeline stage
post_process – Whether to include post-processing in the model, defaults to last pipeline stage
vp_stage – Virtual pipeline stage
tokenizer – Tokenizer used with the model
- Returns:
Configured Megatron Core Mamba model instance
- Return type:
MCoreMambaModel
- class bridge.models.mamba.mamba_provider.MambaProvider130M#
Bases:
bridge.models.mamba.mamba_provider.MambaProvider
Configuration for a 130M parameter Mamba model.
- hybrid_override_pattern: str#
None
- num_layers: int#
24
- seq_length: int#
2048
768
- mamba_num_groups: int#
1
768
- make_vocab_size_divisible_by: int#
16
- class bridge.models.mamba.mamba_provider.MambaProvider370M#
Bases:
bridge.models.mamba.mamba_provider.MambaProvider
Configuration for a 370M parameter Mamba model.
- hybrid_override_pattern: str#
None
- num_layers: int#
48
- seq_length: int#
2048
1024
- mamba_num_groups: int#
1
1024
- make_vocab_size_divisible_by: int#
16
- class bridge.models.mamba.mamba_provider.MambaProvider780M#
Bases:
bridge.models.mamba.mamba_provider.MambaProvider
Configuration for a 780M parameter Mamba model.
- hybrid_override_pattern: str#
None
- num_layers: int#
48
- seq_length: int#
2048
1536
- mamba_num_groups: int#
1
1536
- make_vocab_size_divisible_by: int#
16
- class bridge.models.mamba.mamba_provider.MambaProvider1_3B#
Bases:
bridge.models.mamba.mamba_provider.MambaProvider
Configuration for a 1.3B parameter Mamba model.
- hybrid_override_pattern: str#
None
- num_layers: int#
48
- seq_length: int#
2048
2048
- mamba_num_groups: int#
1
2048
- make_vocab_size_divisible_by: int#
16
- class bridge.models.mamba.mamba_provider.MambaProvider2_7B#
Bases:
bridge.models.mamba.mamba_provider.MambaProvider
Configuration for a 2.7B parameter Mamba model.
- hybrid_override_pattern: str#
None
- num_layers: int#
64
- seq_length: int#
2048
2560
- mamba_num_groups: int#
1
2560
- make_vocab_size_divisible_by: int#
16
- class bridge.models.mamba.mamba_provider.NVIDIAMambaProvider8B#
Bases:
bridge.models.mamba.mamba_provider.MambaProvider
Configuration for a 8B parameter Mamba model used in NVIDIA research.
- hybrid_override_pattern: str#
None
- num_attention_heads: int#
32
- num_layers: int#
56
- seq_length: int#
4096
4096
- mamba_num_groups: int#
8
4096
- make_vocab_size_divisible_by: int#
128
- class bridge.models.mamba.mamba_provider.NVIDIAMambaHybridProvider8B#
Bases:
bridge.models.mamba.mamba_provider.MambaProvider
Configuration for a 8B parameter hybrid Mamba model used in NVIDIA research.
- hybrid_override_pattern: str#
‘M-M-M–M-M*-M-M-M-M–M*-M-M-M-M-M*–M-M-M-M-M*-M–M-M-M-’
- num_layers: int#
56
- seq_length: int#
4096
4096
- mamba_num_groups: int#
8
16384
- num_attention_heads: int#
32
- num_query_groups: int#
8
- make_vocab_size_divisible_by: int#
128