bridge.models.hybrid.hybrid_provider#

Module Contents#

Classes#

HybridModelProvider

Configuration and provider for Megatron Core Hybrid models.

Functions#

modelopt_hybrid_stack_spec

Hybrid stack specification for quantization with ModelOpt.

transformer_engine_hybrid_stack_spec

Return the default Hybrid stack spec with Transformer Engine layers.

get_default_hybrid_stack_spec

Determine the most appropriate Hybrid stack specification based on configuration.

Data#

API#

bridge.models.hybrid.hybrid_provider.logger#

‘getLogger(…)’

bridge.models.hybrid.hybrid_provider.modelopt_hybrid_stack_spec(
config: HybridModelProvider | None = None,
) megatron.core.transformer.ModuleSpec#

Hybrid stack specification for quantization with ModelOpt.

Uses Norm instead of TENorm and ColumnParallelLinear/RowParallelLinear instead of TE layers to enable proper quantizer insertion by ModelOpt.

Parameters:

config – Optional Hybrid configuration object.

Returns:

Module specification for quantization-ready Hybrid stack.

bridge.models.hybrid.hybrid_provider.transformer_engine_hybrid_stack_spec() megatron.core.transformer.ModuleSpec#

Return the default Hybrid stack spec with Transformer Engine layers.

This is a named function (not a lambda) to allow proper serialization and reconstruction from checkpoints. Named functions can be imported via their module path, unlike lambdas.

Returns:

Default Hybrid stack specification from megatron.core.

bridge.models.hybrid.hybrid_provider.get_default_hybrid_stack_spec(
config: HybridModelProvider,
) megatron.core.transformer.ModuleSpec#

Determine the most appropriate Hybrid stack specification based on configuration.

Parameters:

config – Hybrid configuration object.

Returns:

Appropriate module specification based on config.

class bridge.models.hybrid.hybrid_provider.HybridModelProvider#

Bases: megatron.bridge.models.transformer_config.TransformerConfig, megatron.bridge.models.model_provider.ModelProviderMixin[megatron.core.models.hybrid.hybrid_model.HybridModel]

Configuration and provider for Megatron Core Hybrid models.

This class extends TransformerConfig with Hybrid-specific parameters and provides a method to instantiate configured Hybrid models.

fp16_lm_cross_entropy: bool#

False

parallel_output: bool#

True

share_embeddings_and_output_weights: bool#

False

params_dtype: torch.dtype#

None

fp16: bool#

False

bf16: bool#

True

num_layers: int | None#

None

mamba_num_groups: int#

8

num_attention_heads: int#

1

hybrid_attention_ratio: float#

0.0

hybrid_mlp_ratio: float#

0.0

hybrid_override_pattern: str | None#

None

hybrid_layer_pattern: str | None#

None

seq_length: int#

8192

position_embedding_type: Literal[learned_absolute, rope, none]#

‘none’

rotary_percent: float#

1.0

rotary_base: int#

10000

seq_len_interpolation_factor: float | None#

None

apply_rope_fusion: bool#

True

make_vocab_size_divisible_by: int#

128

gated_linear_unit: bool#

False

normalization: str#

‘RMSNorm’

add_bias_linear: bool#

False

hidden_dropout: float#

0.0

attention_dropout: float#

0.0

layernorm_epsilon: float#

1e-05

attention_backend: megatron.core.transformer.enums.AttnBackend#

None

deallocate_pipeline_outputs: bool#

True

bias_dropout_fusion: bool#

True

cross_entropy_loss_fusion: bool#

True

gradient_accumulation_fusion: bool#

‘field(…)’

hybrid_stack_spec: megatron.core.transformer.ModuleSpec | Callable[[], megatron.core.transformer.ModuleSpec] | Callable[[bridge.models.hybrid.hybrid_provider.HybridModelProvider], megatron.core.transformer.ModuleSpec] | None#

None

vocab_size: int | None#

None

should_pad_vocab: bool#

False

hf_model_id: str | None#

None

_pg_collection: megatron.core.process_groups_config.ProcessGroupCollection | None#

None

mtp_num_layers: int | None#

0

mtp_hybrid_override_pattern: str | None#

None

keep_mtp_spec_in_bf16: bool#

False

Optional HuggingFace model identifier associated with this provider.

restore_modelopt_state: bool#

False

finalize() None#

Finalize the Hybrid model provider.

Calculates the number of layers from hybrid_layer_pattern and executes the deferred MCore post-init logic.

_resolve_hybrid_stack_spec() megatron.core.transformer.ModuleSpec#

Resolve the configured Hybrid stack spec.

provide(
pre_process=None,
post_process=None,
vp_stage=None,
) megatron.core.models.hybrid.hybrid_model.HybridModel#

Configure and instantiate a Megatron Core Hybrid model based on this configuration.

Parameters:
  • pre_process – Whether to include pre-processing in the model, defaults to first pipeline stage.

  • post_process – Whether to include post-processing in the model, defaults to last pipeline stage.

  • vp_stage – Virtual pipeline stage.

Returns:

Configured Megatron Core Hybrid model instance.