`bridge.models.conversion.quant_mapping`#

Module Contents#

Classes#

`AmaxMapping`	Amax mapping for quantization.
`AmaxFanoutMapping`	Replicated amax mapping that fans out one Megatron amax to multiple HF targets.
`_DerivedAmaxMapping`	Resolve amax names through the original wildcard weight mapping.
`MoeAmaxFanoutMapping`	Shared MoE amax mapping that fans out to per-expert HF quantizers.

Functions#

`_convert_hf_weight_names`
`_has_skipped_qkv_path_segment`
`_derive_qkv_megatron_parent`
`_derive_qkv_hf_parent`
`derive_kv_bmm_amax_map`	Derive K/V BMM quantizer amax mappings from eligible fused-QKV mappings.
`convert_to_amax_map`	Convert weight mappings to amax mappings for quantization.

Data#

`_QKV_PROJECTION_NAMES`
`_SKIPPED_QKV_PATH_SEGMENTS`

API#

class bridge.models.conversion.quant_mapping.AmaxMapping(megatron_param: str, hf_param: str | dict[str, str])#

Bases: megatron.bridge.models.conversion.param_mapping.ReplicatedMapping

Amax mapping for quantization.

Initialization

Initialize the Amax mapping.

class bridge.models.conversion.quant_mapping.AmaxFanoutMapping(megatron_param: str, hf_params: list[str])#

Bases: bridge.models.conversion.quant_mapping.AmaxMapping

Replicated amax mapping that fans out one Megatron amax to multiple HF targets.

Used for QKV and gate/up where the amax values are shared but need to be written/read under multiple HF parameter names.

Initialization

Initialize the Amax mapping.

megatron_to_hf(megatron_weights, megatron_module)#

resolve(captures: tuple[str, ...])#: Resolve wildcards for both megatron_param and all HF targets.

class bridge.models.conversion.quant_mapping._DerivedAmaxMapping( source_mapping: megatron.bridge.models.conversion.param_mapping.MegatronParamMapping, mapped_name: str, )#

Bases: bridge.models.conversion.quant_mapping.AmaxMapping

Resolve amax names through the original wildcard weight mapping.

Some weight mappings transform wildcard captures instead of copying them positionally. Keep that transformation when deriving quantizer-buffer names by resolving the weight mapping first, then deriving a concrete amax mapping.

Initialization

Initialize the Amax mapping.

_validate_patterns() → None#: The source mapping owns wildcard validation and resolution.

resolve( captures: tuple[str, ...], ) → megatron.bridge.models.conversion.param_mapping.MegatronParamMapping#

class bridge.models.conversion.quant_mapping.MoeAmaxFanoutMapping( megatron_param: str, hf_patterns: list[str], num_experts: int | None = None, )#

Bases: bridge.models.conversion.quant_mapping.AmaxMapping

Shared MoE amax mapping that fans out to per-expert HF quantizers.

Megatron grouped-MoE layers use one quantizer for each rank’s local expert block, while HF names carry an expert wildcard. This mapping gathers those per-EP-rank amax values and expands the HF expert wildcard during export.

Initialization

Initialize the Amax mapping.

_EXPERT_WILDCARD_RE#: ‘compile(…)’

_validate_patterns() → None#: Allow one extra HF wildcard for the expert index.

property is_expert: bool#: Use normal TP handling; EP fanout is handled explicitly here.

hf_to_megatron(hf_weights, megatron_module)#: Grouped-MoE amax fanout is export-only.

_get_num_experts(megatron_module: object | None) → int | None#

classmethod _resolve_pattern( pattern: str, captures: tuple[str, ...], max_captures: int, ) → str#

_get_num_experts_for_rank( megatron_module: object | None, ) → int | None#

_gather_amax_by_ep_rank(weight: torch.Tensor) → list[torch.Tensor]#

megatron_to_hf( megatron_weights: torch.Tensor | None, megatron_module: object | None, ) → dict[str, torch.Tensor]#

resolve( captures: tuple[str, ...], ) → bridge.models.conversion.quant_mapping.MoeAmaxFanoutMapping#: Resolve layer wildcards while preserving the HF expert wildcard.

bridge.models.conversion.quant_mapping._convert_hf_weight_names( hf_param: str | dict[str, str], mapped_name: str, ) → list[str]#

bridge.models.conversion.quant_mapping._QKV_PROJECTION_NAMES#: None

bridge.models.conversion.quant_mapping._SKIPPED_QKV_PATH_SEGMENTS#: ‘frozenset(…)’

bridge.models.conversion.quant_mapping._has_skipped_qkv_path_segment(path: str) → bool#

bridge.models.conversion.quant_mapping._derive_qkv_megatron_parent(megatron_param: str) → str | None#

bridge.models.conversion.quant_mapping._derive_qkv_hf_parent(hf_params: dict[str, str]) → str | None#

bridge.models.conversion.quant_mapping.derive_kv_bmm_amax_map( mappings: list[megatron.bridge.models.conversion.param_mapping.MegatronParamMapping], ) → list[megatron.bridge.models.conversion.param_mapping.MegatronParamMapping]#: Derive K/V BMM quantizer amax mappings from eligible fused-QKV mappings.

bridge.models.conversion.quant_mapping.convert_to_amax_map( mappings: list[megatron.bridge.models.conversion.param_mapping.MegatronParamMapping], mapped_name: str = '.weight_quantizer._amax', ) → list[megatron.bridge.models.conversion.param_mapping.MegatronParamMapping]#

Convert weight mappings to amax mappings for quantization.

This function converts parameter mappings for weights to their corresponding amax (absolute maximum) parameter mappings used in quantization. For example:

“layer.weight” -> “layer.weight_quantizer._amax”

Parameters:: mappings – List of MegatronParamMapping objects for weight parameters
Returns:: List of new MegatronParamMapping objects for amax parameters

.. note::

Mappings ending in ‘.weight’ become regular amax mappings. MoE expert mappings ending in ‘.weight*’ become fanout mappings when their HF names contain one additional expert wildcard. Other layouts cannot be represented by the shared-expert fanout mapping and are skipped.

bridge.models.conversion.quant_mapping#

Module Contents#

Classes#

Functions#

Data#

API#

`bridge.models.conversion.quant_mapping`#