bridge.models.hybrid.hybrid_provider#
Module Contents#
Classes#
Configuration and provider for Megatron Core Hybrid models. |
Functions#
Hybrid stack specification for quantization with ModelOpt. |
|
Return the default Hybrid stack spec with Transformer Engine layers. |
|
Determine the most appropriate Hybrid stack specification based on configuration. |
Data#
API#
- bridge.models.hybrid.hybrid_provider.logger#
‘getLogger(…)’
- bridge.models.hybrid.hybrid_provider.modelopt_hybrid_stack_spec(
- config: HybridModelProvider | None = None,
Hybrid stack specification for quantization with ModelOpt.
Uses Norm instead of TENorm and ColumnParallelLinear/RowParallelLinear instead of TE layers to enable proper quantizer insertion by ModelOpt.
- Parameters:
config – Optional Hybrid configuration object.
- Returns:
Module specification for quantization-ready Hybrid stack.
- bridge.models.hybrid.hybrid_provider.transformer_engine_hybrid_stack_spec() megatron.core.transformer.ModuleSpec#
Return the default Hybrid stack spec with Transformer Engine layers.
This is a named function (not a lambda) to allow proper serialization and reconstruction from checkpoints. Named functions can be imported via their module path, unlike lambdas.
- Returns:
Default Hybrid stack specification from megatron.core.
- bridge.models.hybrid.hybrid_provider.get_default_hybrid_stack_spec(
- config: HybridModelProvider,
Determine the most appropriate Hybrid stack specification based on configuration.
- Parameters:
config – Hybrid configuration object.
- Returns:
Appropriate module specification based on config.
- class bridge.models.hybrid.hybrid_provider.HybridModelProvider#
Bases:
megatron.bridge.models.transformer_config.TransformerConfig,megatron.bridge.models.model_provider.ModelProviderMixin[megatron.core.models.hybrid.hybrid_model.HybridModel]Configuration and provider for Megatron Core Hybrid models.
This class extends TransformerConfig with Hybrid-specific parameters and provides a method to instantiate configured Hybrid models.
- fp16_lm_cross_entropy: bool#
False
- parallel_output: bool#
True
False
- params_dtype: torch.dtype#
None
- fp16: bool#
False
- bf16: bool#
True
- num_layers: int | None#
None
- mamba_num_groups: int#
8
- num_attention_heads: int#
1
- hybrid_attention_ratio: float#
0.0
- hybrid_mlp_ratio: float#
0.0
- hybrid_override_pattern: str | None#
None
- hybrid_layer_pattern: str | None#
None
- seq_length: int#
8192
- position_embedding_type: Literal[learned_absolute, rope, none]#
‘none’
- rotary_percent: float#
1.0
- rotary_base: int#
10000
- seq_len_interpolation_factor: float | None#
None
- apply_rope_fusion: bool#
True
- make_vocab_size_divisible_by: int#
128
- gated_linear_unit: bool#
False
- normalization: str#
‘RMSNorm’
- add_bias_linear: bool#
False
0.0
- attention_dropout: float#
0.0
- layernorm_epsilon: float#
1e-05
- attention_backend: megatron.core.transformer.enums.AttnBackend#
None
- deallocate_pipeline_outputs: bool#
True
- bias_dropout_fusion: bool#
True
- cross_entropy_loss_fusion: bool#
True
- gradient_accumulation_fusion: bool#
‘field(…)’
- hybrid_stack_spec: megatron.core.transformer.ModuleSpec | Callable[[], megatron.core.transformer.ModuleSpec] | Callable[[bridge.models.hybrid.hybrid_provider.HybridModelProvider], megatron.core.transformer.ModuleSpec] | None#
None
- vocab_size: int | None#
None
- should_pad_vocab: bool#
False
- hf_model_id: str | None#
None
- _pg_collection: megatron.core.process_groups_config.ProcessGroupCollection | None#
None
- mtp_num_layers: int | None#
0
- mtp_hybrid_override_pattern: str | None#
None
- keep_mtp_spec_in_bf16: bool#
False
Optional HuggingFace model identifier associated with this provider.
- restore_modelopt_state: bool#
False
- finalize() None#
Finalize the Hybrid model provider.
Calculates the number of layers from
hybrid_layer_patternand executes the deferred MCore post-init logic.
- _resolve_hybrid_stack_spec() megatron.core.transformer.ModuleSpec#
Resolve the configured Hybrid stack spec.
- provide(
- pre_process=None,
- post_process=None,
- vp_stage=None,
Configure and instantiate a Megatron Core Hybrid model based on this configuration.
- Parameters:
pre_process – Whether to include pre-processing in the model, defaults to first pipeline stage.
post_process – Whether to include post-processing in the model, defaults to last pipeline stage.
vp_stage – Virtual pipeline stage.
- Returns:
Configured Megatron Core Hybrid model instance.