core.ssm.mamba_hybrid_layer_allocation#

Module Contents#

Classes#

Symbols

Symbols for different layer types and pattern separators.

ParsedHybridPattern

Result of parsing a unified hybrid pattern string.

Functions#

pattern_from_ratios

Convert deprecated ratio arguments to a layer pattern string.

get_hybrid_total_layer_count

Returns the total number of main decoder layers in a hybrid layer pattern.

get_hybrid_total_pipeline_segment_count

Returns the number of pipeline segments in a hybrid layer pattern.

get_hybrid_layer_counts

Count layers by type across the full hybrid pattern (main + MTP).

parse_hybrid_pattern

Parse a unified hybrid pattern string into main and MTP components.

_validate_pattern

Validate that a pattern contains only valid layer symbols.

validate_segment_layers

Validate and convert a single pipeline segment pattern to a layer type list.

select_pipeline_segment

Select and validate the pipeline segment for the given PP rank and VP stage.

get_layer_maps_from_layer_type_list

Returns maps from global layer index to the corresponding layer index for each layer type in [Attention, Mamba, MLP, MoE] given a layer type list.

Data#

API#

core.ssm.mamba_hybrid_layer_allocation.logger#

‘getLogger(…)’

class core.ssm.mamba_hybrid_layer_allocation.Symbols#

Symbols for different layer types and pattern separators.

MAMBA#

‘M’

ATTENTION#

‘*’

MLP#

‘-’

MOE#

‘E’

PIPE#

‘|’

MTP_SEPARATOR#

‘/’

VALID_LAYERS#

None

class core.ssm.mamba_hybrid_layer_allocation.ParsedHybridPattern#

Result of parsing a unified hybrid pattern string.

A unified pattern encodes both the main decoder pattern and the MTP pattern in a single string using “/” as a separator. The main pattern may also contain “|” pipe symbols to define pipeline stage boundaries for flexible virtual pipeline parallelism (fVPP).

Format: “<main_pattern>/<mtp_pattern>/<mtp_pattern>/…”

.. rubric:: Examples

  • “MM” -> main=”MM”, mtp=None, depths=0 (no MTP)

  • “MM/MM/MM” -> main=”MM”, mtp=”MM”, depths=2

  • “MMMM/*M/*M/*M” -> main=”MMMM”, mtp=”*M”, depths=3

  • “M-M-|M-M*-/MM/MM” -> main=”M-M-|M-M*-” (2 PP stages), mtp=”MM”, depths=2

The “/” symbol introduces MTP patterns. Each repeated pattern after the main decoder represents one MTP prediction depth.

The “|” symbol in the main pattern defines pipeline stage boundaries.

.. attribute:: main_pattern

The main decoder layer pattern (e.g., “MM” or “M-M-|M-M*-“)

.. attribute:: mtp_pattern

The MTP layer pattern per depth (e.g., “MM”), or None if no MTP

.. attribute:: mtp_num_depths

Number of MTP prediction depths (0 if no MTP)

main_pattern: Optional[str]#

None

mtp_pattern: Optional[str]#

None

mtp_num_depths: int#

None

core.ssm.mamba_hybrid_layer_allocation.pattern_from_ratios(
num_layers: int,
attention_ratio: float = 0.0,
mlp_ratio: float = 0.0,
) str#

Convert deprecated ratio arguments to a layer pattern string.

Generates an evenly-spaced hybrid layer pattern from target attention and MLP ratios. This exists for backward compatibility with code that uses the deprecated hybrid_attention_ratio and hybrid_mlp_ratio parameters.

Parameters:
  • num_layers – Total number of layers.

  • attention_ratio – Target ratio of attention layers to total layers.

  • mlp_ratio – Target ratio of MLP layers to total layers.

Returns:

A layer pattern string (e.g., “MMMMMMMM”).

core.ssm.mamba_hybrid_layer_allocation.get_hybrid_total_layer_count(pattern: str) int#

Returns the total number of main decoder layers in a hybrid layer pattern.

Extracts the main pattern (before the first MTP separator ‘/’), strips pipeline stage separators ‘|’, and returns the character count.

Parameters:

pattern – Full hybrid layer pattern, possibly including MTP and pipe separators.

Returns:

Total number of layers in the main decoder pattern.

core.ssm.mamba_hybrid_layer_allocation.get_hybrid_total_pipeline_segment_count(pattern: str) int#

Returns the number of pipeline segments in a hybrid layer pattern.

Extracts the main pattern (before the first MTP separator ‘/’) and counts the number of segments delimited by ‘|’.

Parameters:

pattern – Full hybrid layer pattern, possibly including MTP and pipe separators.

Returns:

Number of pipeline segments (pipe count + 1).

core.ssm.mamba_hybrid_layer_allocation.get_hybrid_layer_counts(pattern: str) Dict[str, int]#

Count layers by type across the full hybrid pattern (main + MTP).

Parses the pattern to extract main and MTP components, then counts each layer type. Main pattern ‘|’ separators are skipped. MTP layers are counted once per MTP depth.

Parameters:

pattern – Full hybrid layer pattern string.

Returns:

Dictionary mapping layer symbol to count. Keys are Symbols.ATTENTION, Symbols.MAMBA, Symbols.MLP, and Symbols.MOE.

.. rubric:: Examples

get_hybrid_layer_counts(“MM”) {‘*’: 2, ‘M’: 2, ‘-’: 0, ‘E’: 0}

get_hybrid_layer_counts(“M-M-|M-M*-/MM/MM”) {‘*’: 1, ‘M’: 8, ‘-’: 4, ‘E’: 0}

core.ssm.mamba_hybrid_layer_allocation.parse_hybrid_pattern(
pattern: Optional[str],
) core.ssm.mamba_hybrid_layer_allocation.ParsedHybridPattern#

Parse a unified hybrid pattern string into main and MTP components.

The pattern uses “/” as a separator between the main decoder pattern and MTP patterns. Each MTP pattern after the separator represents one prediction depth. The main pattern may contain “|” pipe symbols for pipeline stage boundaries.

Format: “<main_pattern>/<mtp_pattern>/<mtp_pattern>/…”

Parameters:

pattern – Unified pattern string, e.g., “MM/MM/MM” or just “MM

Returns:

ParsedHybridPattern with main_pattern, mtp_pattern, and mtp_num_depths

Raises:
  • ValueError – If MTP patterns are inconsistent (all must be identical)

  • ValueError – If pattern contains invalid layer symbols

.. rubric:: Examples

parse_hybrid_pattern(“MM”) ParsedHybridPattern(main_pattern=”MM”, mtp_pattern=None, mtp_num_depths=0)

parse_hybrid_pattern(“MM/MM/MM”) ParsedHybridPattern(main_pattern=”MM”, mtp_pattern=”MM”, mtp_num_depths=2)

parse_hybrid_pattern(“MMMM/*M/*M/*M”) ParsedHybridPattern(main_pattern=”MMMM”, mtp_pattern=”*M”, mtp_num_depths=3)

parse_hybrid_pattern(“M-M-|M-M*-/MM/MM”) ParsedHybridPattern(main_pattern=”M-M-|M-M*-”, mtp_pattern=”MM”, mtp_num_depths=2)

core.ssm.mamba_hybrid_layer_allocation._validate_pattern(
pattern: str,
pattern_name: str,
allow_pipe: bool = False,
) None#

Validate that a pattern contains only valid layer symbols.

Parameters:
  • pattern – Layer pattern string to validate

  • pattern_name – Name of pattern for error messages (e.g., “main” or “MTP”)

  • allow_pipe – Whether to allow the pipe ‘|’ separator (for main patterns)

Raises:

ValueError – If pattern contains invalid symbols

core.ssm.mamba_hybrid_layer_allocation.validate_segment_layers(segment: str) List[str]#

Validate and convert a single pipeline segment pattern to a layer type list.

This is used after the main pattern has been split by ‘|’ into segments. Each segment should contain only valid layer symbols (no ‘|’).

Parameters:

segment – A single pipeline segment pattern string (e.g., “M-M*-“)

Returns:

List of layer type characters.

Raises:

ValueError – If segment contains invalid layer symbols.

core.ssm.mamba_hybrid_layer_allocation.select_pipeline_segment(
main_pattern: str,
pp_group: Optional[torch.distributed.ProcessGroup],
vp_stage: Optional[int],
first_stage_layers: Optional[int] = None,
last_stage_layers: Optional[int] = None,
) Tuple[List[str], int]#

Select and validate the pipeline segment for the given PP rank and VP stage.

When the main pattern contains ‘|’ pipe separators, splits by ‘|’ into pipeline segments and selects the segment for the current PP rank / VP stage.

When the pattern has no pipes but pp_size > 1, falls back to runtime layer slicing (for backwards compatibility), supporting both even and uneven PP splits via first_stage_layers / last_stage_layers.

Parameters:
  • main_pattern – Main decoder pattern (may contain ‘|’ separators). Empty string is allowed (produces one empty segment).

  • pp_group – Pipeline parallel process group, or None if not using PP.

  • vp_stage – Virtual pipeline stage, or None if not using VPP.

  • first_stage_layers – Number of layers on the first pipeline stage for uneven PP. Only valid when the pattern has no pipe separators.

  • last_stage_layers – Number of layers on the last pipeline stage for uneven PP. Only valid when the pattern has no pipe separators.

Returns:

Tuple of (layer_type_list, layer_offset) where layer_type_list is the list of layer type characters for this segment, and layer_offset is the sum of layer counts from all preceding segments.

Raises:

ValueError – If the segment contains invalid layer symbols, if first/last_stage_layers are used with pipe separators, if VPP is requested without pipe separators, or if layer counts are not evenly divisible across pipeline stages.

core.ssm.mamba_hybrid_layer_allocation.get_layer_maps_from_layer_type_list(
layer_type_list: List[str],
) Tuple[Dict[int, int], Dict[int, int], Dict[int, int]]#

Returns maps from global layer index to the corresponding layer index for each layer type in [Attention, Mamba, MLP, MoE] given a layer type list.