bridge.models.conversion.quant_mapping#

Module Contents#

Classes#

AmaxMapping

Amax mapping for quantization.

AmaxFanoutMapping

Replicated amax mapping that fans out one Megatron amax to multiple HF targets.

MoeAmaxFanoutMapping

Shared MoE amax mapping that fans out to per-expert HF quantizers.

Functions#

_convert_hf_weight_names

convert_to_amax_map

Convert weight mappings to amax mappings for quantization.

API#

class bridge.models.conversion.quant_mapping.AmaxMapping(megatron_param: str, hf_param: str | dict[str, str])#

Bases: megatron.bridge.models.conversion.param_mapping.ReplicatedMapping

Amax mapping for quantization.

Initialization

Initialize the Amax mapping.

class bridge.models.conversion.quant_mapping.AmaxFanoutMapping(megatron_param: str, hf_params: list[str])#

Bases: bridge.models.conversion.quant_mapping.AmaxMapping

Replicated amax mapping that fans out one Megatron amax to multiple HF targets.

Used for QKV and gate/up where the amax values are shared but need to be written/read under multiple HF parameter names.

Initialization

Initialize the Amax mapping.

megatron_to_hf(megatron_weights, megatron_module)#
resolve(captures: tuple[str, ...])#

Resolve wildcards for both megatron_param and all HF targets.

class bridge.models.conversion.quant_mapping.MoeAmaxFanoutMapping(
megatron_param: str,
hf_patterns: list[str],
num_experts: int | None = None,
)#

Bases: bridge.models.conversion.quant_mapping.AmaxMapping

Shared MoE amax mapping that fans out to per-expert HF quantizers.

Megatron grouped-MoE layers use one quantizer for each rank’s local expert block, while HF names carry an expert wildcard. This mapping gathers those per-EP-rank amax values and expands the HF expert wildcard during export.

Initialization

Initialize the Amax mapping.

_EXPERT_WILDCARD_RE#

‘compile(…)’

_validate_patterns() None#

Allow one extra HF wildcard for the expert index.

property is_expert: bool#

Use normal TP handling; EP fanout is handled explicitly here.

hf_to_megatron(hf_weights, megatron_module)#

Grouped-MoE amax fanout is export-only.

_get_num_experts(megatron_module: object | None) int | None#
classmethod _resolve_pattern(
pattern: str,
captures: tuple[str, ...],
max_captures: int,
) str#
_get_num_experts_for_rank(
megatron_module: object | None,
) int | None#
_gather_amax_by_ep_rank(weight: torch.Tensor) list[torch.Tensor]#
megatron_to_hf(
megatron_weights: torch.Tensor | None,
megatron_module: object | None,
) dict[str, torch.Tensor]#
resolve(
captures: tuple[str, ...],
) bridge.models.conversion.quant_mapping.MoeAmaxFanoutMapping#

Resolve layer wildcards while preserving the HF expert wildcard.

bridge.models.conversion.quant_mapping._convert_hf_weight_names(
hf_param: str | dict[str, str],
mapped_name: str,
) list[str]#
bridge.models.conversion.quant_mapping.convert_to_amax_map(
mappings: list[megatron.bridge.models.conversion.param_mapping.MegatronParamMapping],
mapped_name: str = '.weight_quantizer._amax',
) list[megatron.bridge.models.conversion.param_mapping.MegatronParamMapping]#

Convert weight mappings to amax mappings for quantization.

This function converts parameter mappings for weights to their corresponding amax (absolute maximum) parameter mappings used in quantization. For example:

  • “layer.weight” -> “layer.weight_quantizer._amax”

Parameters:

mappings – List of MegatronParamMapping objects for weight parameters

Returns:

List of new MegatronParamMapping objects for amax parameters

.. note::

Mappings ending in ‘.weight’ become regular amax mappings. MoE expert mappings ending in ‘.weight*’ become fanout mappings because Megatron stores a shared expert amax while HF stores per-expert amax names.