bridge.models.conversion.quant_mapping#
Module Contents#
Classes#
Amax mapping for quantization. |
|
Replicated amax mapping that fans out one Megatron amax to multiple HF targets. |
|
Shared MoE amax mapping that fans out to per-expert HF quantizers. |
Functions#
Convert weight mappings to amax mappings for quantization. |
API#
- class bridge.models.conversion.quant_mapping.AmaxMapping(megatron_param: str, hf_param: str | dict[str, str])#
Bases:
megatron.bridge.models.conversion.param_mapping.ReplicatedMappingAmax mapping for quantization.
Initialization
Initialize the Amax mapping.
- class bridge.models.conversion.quant_mapping.AmaxFanoutMapping(megatron_param: str, hf_params: list[str])#
Bases:
bridge.models.conversion.quant_mapping.AmaxMappingReplicated amax mapping that fans out one Megatron amax to multiple HF targets.
Used for QKV and gate/up where the amax values are shared but need to be written/read under multiple HF parameter names.
Initialization
Initialize the Amax mapping.
- megatron_to_hf(megatron_weights, megatron_module)#
- resolve(captures: tuple[str, ...])#
Resolve wildcards for both megatron_param and all HF targets.
- class bridge.models.conversion.quant_mapping.MoeAmaxFanoutMapping(
- megatron_param: str,
- hf_patterns: list[str],
- num_experts: int | None = None,
Bases:
bridge.models.conversion.quant_mapping.AmaxMappingShared MoE amax mapping that fans out to per-expert HF quantizers.
Megatron grouped-MoE layers use one quantizer for each rank’s local expert block, while HF names carry an expert wildcard. This mapping gathers those per-EP-rank amax values and expands the HF expert wildcard during export.
Initialization
Initialize the Amax mapping.
- _EXPERT_WILDCARD_RE#
‘compile(…)’
- _validate_patterns() None#
Allow one extra HF wildcard for the expert index.
- property is_expert: bool#
Use normal TP handling; EP fanout is handled explicitly here.
- hf_to_megatron(hf_weights, megatron_module)#
Grouped-MoE amax fanout is export-only.
- _get_num_experts(megatron_module: object | None) int | None#
- classmethod _resolve_pattern(
- pattern: str,
- captures: tuple[str, ...],
- max_captures: int,
- _get_num_experts_for_rank(
- megatron_module: object | None,
- _gather_amax_by_ep_rank(weight: torch.Tensor) list[torch.Tensor]#
- megatron_to_hf(
- megatron_weights: torch.Tensor | None,
- megatron_module: object | None,
- resolve(
- captures: tuple[str, ...],
Resolve layer wildcards while preserving the HF expert wildcard.
- bridge.models.conversion.quant_mapping._convert_hf_weight_names(
- hf_param: str | dict[str, str],
- mapped_name: str,
- bridge.models.conversion.quant_mapping.convert_to_amax_map(
- mappings: list[megatron.bridge.models.conversion.param_mapping.MegatronParamMapping],
- mapped_name: str = '.weight_quantizer._amax',
Convert weight mappings to amax mappings for quantization.
This function converts parameter mappings for weights to their corresponding amax (absolute maximum) parameter mappings used in quantization. For example:
“layer.weight” -> “layer.weight_quantizer._amax”
- Parameters:
mappings – List of MegatronParamMapping objects for weight parameters
- Returns:
List of new MegatronParamMapping objects for amax parameters
.. note::
Mappings ending in ‘.weight’ become regular amax mappings. MoE expert mappings ending in ‘.weight*’ become fanout mappings because Megatron stores a shared expert amax while HF stores per-expert amax names.