core.ssm.mamba_hybrid_layer_allocation#
Module Contents#
Classes#
Symbols for different layer types and pattern separators. |
|
Result of parsing a unified hybrid pattern string. |
Functions#
Convert deprecated ratio arguments to a layer pattern string. |
|
Returns the total number of main decoder layers in a hybrid layer pattern. |
|
Returns the number of pipeline segments in a hybrid layer pattern. |
|
Count layers by type across the full hybrid pattern (main + MTP). |
|
Parse a unified hybrid pattern string into main and MTP components. |
|
Validate that a pattern contains only valid layer symbols. |
|
Validate and convert a single pipeline segment pattern to a layer type list. |
|
Select and validate the pipeline segment for the given PP rank and VP stage. |
|
Returns maps from global layer index to the corresponding layer index for each layer type in [Attention, Mamba, MLP, MoE] given a layer type list. |
Data#
API#
- core.ssm.mamba_hybrid_layer_allocation.logger#
‘getLogger(…)’
- class core.ssm.mamba_hybrid_layer_allocation.Symbols#
Symbols for different layer types and pattern separators.
- MAMBA#
‘M’
- ATTENTION#
‘*’
- MLP#
‘-’
- MOE#
‘E’
- PIPE#
‘|’
- MTP_SEPARATOR#
‘/’
- VALID_LAYERS#
None
- class core.ssm.mamba_hybrid_layer_allocation.ParsedHybridPattern#
Result of parsing a unified hybrid pattern string.
A unified pattern encodes both the main decoder pattern and the MTP pattern in a single string using “/” as a separator. The main pattern may also contain “|” pipe symbols to define pipeline stage boundaries for flexible virtual pipeline parallelism (fVPP).
Format: “<main_pattern>/<mtp_pattern>/<mtp_pattern>/…”
.. rubric:: Examples
“MM” -> main=”MM”, mtp=None, depths=0 (no MTP)
“MM/MM/MM” -> main=”MM”, mtp=”MM”, depths=2
“MMMM/*M/*M/*M” -> main=”MMMM”, mtp=”*M”, depths=3
“M-M-|M-M*-/MM/MM” -> main=”M-M-|M-M*-” (2 PP stages), mtp=”MM”, depths=2
The “/” symbol introduces MTP patterns. Each repeated pattern after the main decoder represents one MTP prediction depth.
The “|” symbol in the main pattern defines pipeline stage boundaries.
.. attribute:: main_pattern
The main decoder layer pattern (e.g., “MM” or “M-M-|M-M*-“)
.. attribute:: mtp_pattern
The MTP layer pattern per depth (e.g., “MM”), or None if no MTP
.. attribute:: mtp_num_depths
Number of MTP prediction depths (0 if no MTP)
- main_pattern: Optional[str]#
None
- mtp_pattern: Optional[str]#
None
- mtp_num_depths: int#
None
- core.ssm.mamba_hybrid_layer_allocation.pattern_from_ratios(
- num_layers: int,
- attention_ratio: float = 0.0,
- mlp_ratio: float = 0.0,
Convert deprecated ratio arguments to a layer pattern string.
Generates an evenly-spaced hybrid layer pattern from target attention and MLP ratios. This exists for backward compatibility with code that uses the deprecated hybrid_attention_ratio and hybrid_mlp_ratio parameters.
- Parameters:
num_layers – Total number of layers.
attention_ratio – Target ratio of attention layers to total layers.
mlp_ratio – Target ratio of MLP layers to total layers.
- Returns:
A layer pattern string (e.g., “MMMMMMMM”).
- core.ssm.mamba_hybrid_layer_allocation.get_hybrid_total_layer_count(pattern: str) int#
Returns the total number of main decoder layers in a hybrid layer pattern.
Extracts the main pattern (before the first MTP separator ‘/’), strips pipeline stage separators ‘|’, and returns the character count.
- Parameters:
pattern – Full hybrid layer pattern, possibly including MTP and pipe separators.
- Returns:
Total number of layers in the main decoder pattern.
- core.ssm.mamba_hybrid_layer_allocation.get_hybrid_total_pipeline_segment_count(pattern: str) int#
Returns the number of pipeline segments in a hybrid layer pattern.
Extracts the main pattern (before the first MTP separator ‘/’) and counts the number of segments delimited by ‘|’.
- Parameters:
pattern – Full hybrid layer pattern, possibly including MTP and pipe separators.
- Returns:
Number of pipeline segments (pipe count + 1).
- core.ssm.mamba_hybrid_layer_allocation.get_hybrid_layer_counts(pattern: str) Dict[str, int]#
Count layers by type across the full hybrid pattern (main + MTP).
Parses the pattern to extract main and MTP components, then counts each layer type. Main pattern ‘|’ separators are skipped. MTP layers are counted once per MTP depth.
- Parameters:
pattern – Full hybrid layer pattern string.
- Returns:
Dictionary mapping layer symbol to count. Keys are Symbols.ATTENTION, Symbols.MAMBA, Symbols.MLP, and Symbols.MOE.
.. rubric:: Examples
get_hybrid_layer_counts(“MM”) {‘*’: 2, ‘M’: 2, ‘-’: 0, ‘E’: 0}
get_hybrid_layer_counts(“M-M-|M-M*-/MM/MM”) {‘*’: 1, ‘M’: 8, ‘-’: 4, ‘E’: 0}
- core.ssm.mamba_hybrid_layer_allocation.parse_hybrid_pattern(
- pattern: Optional[str],
Parse a unified hybrid pattern string into main and MTP components.
The pattern uses “/” as a separator between the main decoder pattern and MTP patterns. Each MTP pattern after the separator represents one prediction depth. The main pattern may contain “|” pipe symbols for pipeline stage boundaries.
Format: “<main_pattern>/<mtp_pattern>/<mtp_pattern>/…”
- Parameters:
pattern – Unified pattern string, e.g., “MM/MM/MM” or just “MM”
- Returns:
ParsedHybridPattern with main_pattern, mtp_pattern, and mtp_num_depths
- Raises:
ValueError – If MTP patterns are inconsistent (all must be identical)
ValueError – If pattern contains invalid layer symbols
.. rubric:: Examples
parse_hybrid_pattern(“MM”) ParsedHybridPattern(main_pattern=”MM”, mtp_pattern=None, mtp_num_depths=0)
parse_hybrid_pattern(“MM/MM/MM”) ParsedHybridPattern(main_pattern=”MM”, mtp_pattern=”MM”, mtp_num_depths=2)
parse_hybrid_pattern(“MMMM/*M/*M/*M”) ParsedHybridPattern(main_pattern=”MMMM”, mtp_pattern=”*M”, mtp_num_depths=3)
parse_hybrid_pattern(“M-M-|M-M*-/MM/MM”) ParsedHybridPattern(main_pattern=”M-M-|M-M*-”, mtp_pattern=”MM”, mtp_num_depths=2)
- core.ssm.mamba_hybrid_layer_allocation._validate_pattern(
- pattern: str,
- pattern_name: str,
- allow_pipe: bool = False,
Validate that a pattern contains only valid layer symbols.
- Parameters:
pattern – Layer pattern string to validate
pattern_name – Name of pattern for error messages (e.g., “main” or “MTP”)
allow_pipe – Whether to allow the pipe ‘|’ separator (for main patterns)
- Raises:
ValueError – If pattern contains invalid symbols
- core.ssm.mamba_hybrid_layer_allocation.validate_segment_layers(segment: str) List[str]#
Validate and convert a single pipeline segment pattern to a layer type list.
This is used after the main pattern has been split by ‘|’ into segments. Each segment should contain only valid layer symbols (no ‘|’).
- Parameters:
segment – A single pipeline segment pattern string (e.g., “M-M*-“)
- Returns:
List of layer type characters.
- Raises:
ValueError – If segment contains invalid layer symbols.
- core.ssm.mamba_hybrid_layer_allocation.select_pipeline_segment(
- main_pattern: str,
- pp_group: Optional[torch.distributed.ProcessGroup],
- vp_stage: Optional[int],
- first_stage_layers: Optional[int] = None,
- last_stage_layers: Optional[int] = None,
Select and validate the pipeline segment for the given PP rank and VP stage.
When the main pattern contains ‘|’ pipe separators, splits by ‘|’ into pipeline segments and selects the segment for the current PP rank / VP stage.
When the pattern has no pipes but pp_size > 1, falls back to runtime layer slicing (for backwards compatibility), supporting both even and uneven PP splits via first_stage_layers / last_stage_layers.
- Parameters:
main_pattern – Main decoder pattern (may contain ‘|’ separators). Empty string is allowed (produces one empty segment).
pp_group – Pipeline parallel process group, or None if not using PP.
vp_stage – Virtual pipeline stage, or None if not using VPP.
first_stage_layers – Number of layers on the first pipeline stage for uneven PP. Only valid when the pattern has no pipe separators.
last_stage_layers – Number of layers on the last pipeline stage for uneven PP. Only valid when the pattern has no pipe separators.
- Returns:
Tuple of (layer_type_list, layer_offset) where layer_type_list is the list of layer type characters for this segment, and layer_offset is the sum of layer counts from all preceding segments.
- Raises:
ValueError – If the segment contains invalid layer symbols, if first/last_stage_layers are used with pipe separators, if VPP is requested without pipe separators, or if layer counts are not evenly divisible across pipeline stages.
- core.ssm.mamba_hybrid_layer_allocation.get_layer_maps_from_layer_type_list(
- layer_type_list: List[str],
Returns maps from global layer index to the corresponding layer index for each layer type in [Attention, Mamba, MLP, MoE] given a layer type list.