bridge.models.conversion.peft_bridge#
Module Contents#
Classes#
Singleton sentinel returned by |
|
Task describing an adapter’s LoRA weights for conversion or merging. |
|
Materialized adapter weights ready for merge. |
|
Mixin providing adapter-aware utilities for Megatron model bridges. |
Functions#
Return the HF base parameter name associated with this adapter. |
|
Derive HF |
|
Split an HF-format LoRA weight name into its base target path and LoRA suffix. |
|
Return PEFT target parameters in stable per-parent order. |
|
Map PEFT target parameters to their on-disk ParamWrapper prefixes. |
|
Convert exported target-parameter LoRA tensors into PEFT’s ParamWrapper layout. |
|
Rewrite exported adapter weights into the PEFT on-disk state-dict layout. |
|
Infer PEFT |
|
Build an HF PEFT-compatible |
Data#
API#
- class bridge.models.conversion.peft_bridge._AbsentProjectionSentinel#
Singleton sentinel returned by
_split_qkv_linear_out_weightto declare that a projection key has no counterpart in the HF model and should be skipped during adapter export. Example:v_projon Gemma4 global-attention layers that use K=V tying (nov_projweight exists in HF).Bridges that need this behaviour should return this sentinel for the absent key so that the generic export code can distinguish an intentional skip from a bug.
- __slots__#
()
- __repr__() str#
- bridge.models.conversion.peft_bridge.ABSENT_PROJECTION#
‘_AbsentProjectionSentinel(…)’
- bridge.models.conversion.peft_bridge.MegatronModel#
‘TypeVar(…)’
- bridge.models.conversion.peft_bridge.ADAPTER_NAME_MAP#
None
- bridge.models.conversion.peft_bridge.ADAPTER_KEY_TO_SUFFIX#
None
- bridge.models.conversion.peft_bridge.MEGATRON_TO_HF_LORA_SUFFIX#
None
- bridge.models.conversion.peft_bridge.GDN_IN_PROJ_KEYS#
(‘in_proj_qkv’, ‘in_proj_z’, ‘in_proj_b’, ‘in_proj_a’)
- class bridge.models.conversion.peft_bridge.AdapterWeightConversionTask#
Task describing an adapter’s LoRA weights for conversion or merging.
- global_base_prefix: str#
None
- adapter_key: Optional[str]#
None
- alpha: int#
None
- dim: int#
None
- linear_in_task: megatron.bridge.models.conversion.model_bridge.WeightConversionTask#
None
- linear_out_task: megatron.bridge.models.conversion.model_bridge.WeightConversionTask#
None
- requires_expert_splits: bool#
False
- class bridge.models.conversion.peft_bridge.AdapterWeight#
Materialized adapter weights ready for merge.
- global_base_prefix: str#
None
- adapter_key: Optional[str]#
None
- alpha: int#
None
- dim: int#
None
- linear_in_weight: megatron.bridge.models.conversion.model_bridge.MegatronWeightTuple#
None
- linear_out_weight: megatron.bridge.models.conversion.model_bridge.MegatronWeightTuple#
None
- bridge.models.conversion.peft_bridge._select_hf_base_param_name(
- base_mapping,
- adapter_key: Optional[str],
- expected_suffix: str,
Return the HF base parameter name associated with this adapter.
- class bridge.models.conversion.peft_bridge.MegatronPeftBridge#
Mixin providing adapter-aware utilities for Megatron model bridges.
- _get_lora_unwrapped_name(megatron_param: str) str#
Remove
.to_wrapfrom LoRA parameter names.
- _is_adapter_param_name(param_name: str) bool#
Return True if the parameter only belongs to a PEFT adapter.
- _get_adapter_wrap_module(
- local_base_prefix: str,
- megatron_model: Union[bridge.models.conversion.peft_bridge.MegatronModel, List[bridge.models.conversion.peft_bridge.MegatronModel]],
- vp_stage: int,
Locate the adapter wrapper and its underlying module.
- _resolve_hf_adapter_param_name(
- mapping_registry: megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry,
- global_base_prefix: str,
- megatron_adapter_suffix: str,
- base_suffix: str,
- adapter_key: Optional[str],
Resolve the HuggingFace adapter parameter name by translating the base Megatron name.
.. note::
LoRA adapters never register bias tensors for
linear_in/linear_out, so callers only pass weight suffixes here. The bias fallback below is solely for robustness in case a future adapter type introduces biased projections.
- _get_base_hf_param_names_for_adapter(
- mapping_registry: megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry,
- global_base_prefix: str,
- adapter_key: Optional[str],
- base_suffix: str,
Return all HF base parameter names associated with this adapter.
- _make_lora_param_name(
- base_name: str,
- megatron_adapter_suffix: str,
Translate a base HF weight name into its LoRA-specific counterpart.
- _is_fused_qkv(hf_weight_names: Iterable[str]) bool#
Check whether the provided HF names correspond to a fused QKV weight.
- _is_gdn_in_proj_split(hf_weight_names: Iterable[str]) bool#
Check whether the provided HF names correspond to split GDN in_proj weights.
- _is_fused_fc1_gate_up(
- base_hf_weight_names: Iterable[str],
- linear_out_tensor: torch.Tensor,
- base_weight_shape: Optional[torch.Size] = None,
Detect fused FC1 adapters based on names and tensor shape.
- _infer_qkv_projection_from_name(hf_name: str) Optional[str]#
Return q_proj/k_proj/v_proj identifier based on the HF name.
- _infer_gdn_in_proj_projection_from_name(
- hf_name: str,
Return in_proj_qkv/z/b/a identifier based on the HF name.
- _is_fused_fc1_gate_proj(hf_name: str) bool#
Return whether the HF name maps to the gate half of fused FC1.
- _is_fused_fc1_up_proj(hf_name: str) bool#
Return whether the HF name maps to the up half of fused FC1.
- _infer_hf_expert_idx(hf_name: str) Optional[int]#
Return the expert index embedded in an HF MoE weight name.
- _split_qkv_linear_out_weight(
- megatron_model: Union[bridge.models.conversion.peft_bridge.MegatronModel, List[bridge.models.conversion.peft_bridge.MegatronModel]],
- linear_out_weight: torch.Tensor,
Split a fused LoRA linear_out tensor for QKV adapters.
- _split_gdn_in_proj_linear_out_weight(
- megatron_model: Union[bridge.models.conversion.peft_bridge.MegatronModel, List[bridge.models.conversion.peft_bridge.MegatronModel]],
- linear_out_weight: torch.Tensor,
Split a fused LoRA linear_out tensor for GDN in_proj adapters.
- _build_lora_hf_names(
- base_hf_weight_names: List[str],
Build LoRA A/B names for a list of HF base parameter names.
- _collect_packed_expert_adapter_tensors(
- linear_in_tensor: torch.Tensor,
- linear_out_tensor: torch.Tensor,
- expert_linear_in_gathered: Optional[List[torch.Tensor]],
- expert_linear_out_gathered: Optional[List[torch.Tensor]],
- num_moe_experts: int,
Collect one LoRA A/B tensor per expert for grouped expert exports.
- _build_packed_expert_linear_out_by_base(
- megatron_model: List[bridge.models.conversion.peft_bridge.MegatronModel],
- base_hf_weight_names: List[str],
- per_expert_linear_out: List[torch.Tensor],
- is_expert: bool,
Build per-base stacked LoRA-B tensors for packed grouped-expert export.
- _split_fused_fc1_linear_out_weight(
- linear_out_weight: torch.Tensor,
- *,
- is_expert: bool,
Split fused FC1 LoRA linear_out into gate/up with TP-aware ordering.
- _gather_expert_adapter_weight(
- weight: torch.Tensor,
Gather expert-sharded adapter weights across EP ranks when needed.
- _select_expert_adapter_weight(
- weight: torch.Tensor,
- gathered: List[torch.Tensor],
- expert_idx: int,
- num_experts: int,
Select the per-expert adapter weight slice if present.
- _megatron_global_adapters_info_all_pp_ranks(
- megatron_model: Union[bridge.models.conversion.peft_bridge.MegatronModel, List[bridge.models.conversion.peft_bridge.MegatronModel]],
Get all adapters’ information tuple: (global_base_name, local_base_prefix, input_is_parallel, base_linear_is_parallel, requires_expert_splits, alpha, dim, pp_rank, vp_stage) across all pipeline parallel ranks.
- _construct_adapters_names(
- prefix: str,
- adapter_key: Optional[str],
Build linear_in/linear_out parameter names for an adapter.
- Parameters:
prefix – Base module prefix without any adapter suffix (global or local, depending on caller).
adapter_key – Optional adapter identifier used by CanonicalLoRA (e.g.
adapter_q).Nonefor standard single-adapter LoRA modules.
- Returns:
Tuple
(linear_in_name, linear_out_name)containing the parameter names for the adapter’s input and output projection weights.
- build_adapter_conversion_tasks(
- megatron_model: Union[bridge.models.conversion.peft_bridge.MegatronModel, List[bridge.models.conversion.peft_bridge.MegatronModel]],
Construct adapter merge tasks keyed by their base parameter.
The returned dict is keyed by the global LoRA-wrapped parameter name (e.g.,
decoder.layers.0.mlp.linear_fc1.to_wrap.weight). Each value contains the adapter tasks (canonical or regular) that should be merged into that base weight.
- materialize_adapter_weights(
- adapter_tasks: List[bridge.models.conversion.peft_bridge.AdapterWeightConversionTask],
Run adapter merge tasks to gather full adapter weights.
- _materialize_grouped_expert_adapter_tensor(
- task: megatron.bridge.models.conversion.model_bridge.WeightConversionTask,
- *,
- tp_axis: int,
Broadcast and gather grouped-expert adapter weights on their real expert-TP axis.
- stream_adapter_weights_megatron_to_hf(
- megatron_model: Union[bridge.models.conversion.peft_bridge.MegatronModel, List[bridge.models.conversion.peft_bridge.MegatronModel]],
- cpu: bool = True,
- show_progress: bool = True,
Stream only adapter weights without merging them into base tensors.
- _get_fused_adapter_linear_out_slices(
- megatron_model: List[bridge.models.conversion.peft_bridge.MegatronModel],
- base_hf_weight_names: List[str],
- linear_out_tensor: torch.Tensor,
- is_expert: bool = False,
Return per-base-name linear_out slices for fused adapters, else None.
This supports fused QKV adapters (split into q/k/v) and fused FC1 adapters (split into gate/up along dim=0). The returned dict is keyed by the HF base weight name (e.g.
...q_proj.weightor...gate_proj.weight).
- _merge_lora_adapter_weights(
- megatron_model: List[bridge.models.conversion.peft_bridge.MegatronModel],
- converted_weights_dict: Dict[str, torch.Tensor],
- adapter_weights: List[bridge.models.conversion.peft_bridge.AdapterWeight],
Merge LoRA adapter weights back into the base tensor for HF export.
- _merge_grouped_export_adapter_weights(
- task: megatron.bridge.models.conversion.model_bridge.WeightConversionTask,
- converted_weights_dict: Dict[str, torch.Tensor],
- adapter_weights: List[bridge.models.conversion.peft_bridge.AdapterWeight],
- num_moe_experts: int,
Merge LoRA weights into a single grouped-expert export shard.
Grouped expert mappings bypass the standard export path and therefore never reach
_merge_lora_adapter_weights. Merge the current expert’s adapter slice into its per-expert tensor before the grouped export code stacks all experts back together.
- _merge_single_adapter_weight(
- base_weight: torch.Tensor,
- alpha: int,
- dim: int,
- linear_in_weight: torch.Tensor,
- linear_out_weight: torch.Tensor,
Merge a single adapter’s weights with base weight.
The merge is performed in float32 to avoid precision loss from bfloat16 matmul (adapter weights are often stored in bf16). The result is cast back to the original base weight dtype.
- _merge_canonical_adapter_from_weights(
- megatron_model: List[bridge.models.conversion.peft_bridge.MegatronModel],
- converted_weights_dict: Dict[str, torch.Tensor],
- adapter_weights: List[bridge.models.conversion.peft_bridge.AdapterWeight],
Merge CanonicalLoRA adapters using pre-materialized adapter weights.
- bridge.models.conversion.peft_bridge._HF_LORA_SUFFIXES#
(‘.lora_A.weight’, ‘.lora_B.weight’)
- bridge.models.conversion.peft_bridge.infer_target_modules_from_adapter_weights(
- adapter_weight_names: Iterable[str],
Derive HF
target_modulesfrom the HF-format adapter weight names.Given names like
model.layers.0.self_attn.q_proj.lora_A.weight, this extracts the unique module identifiers (q_proj,gate_proj, …) that thepeftlibrary expects inadapter_config.json.
- bridge.models.conversion.peft_bridge._split_hf_lora_weight_name(name: str) tuple[str, str]#
Split an HF-format LoRA weight name into its base target path and LoRA suffix.
- bridge.models.conversion.peft_bridge._order_target_parameters(
- target_parameters: List[str],
Return PEFT target parameters in stable per-parent order.
- bridge.models.conversion.peft_bridge._build_target_parameter_prefixes(
- target_parameters: List[str],
Map PEFT target parameters to their on-disk ParamWrapper prefixes.
- bridge.models.conversion.peft_bridge._pack_target_parameter_adapter_weights(
- lora_a: torch.Tensor,
- lora_b: torch.Tensor,
Convert exported target-parameter LoRA tensors into PEFT’s ParamWrapper layout.
- bridge.models.conversion.peft_bridge.convert_adapter_weights_to_peft_state(
- adapter_weights: Iterable[megatron.bridge.models.conversion.model_bridge.HFWeightTuple],
Rewrite exported adapter weights into the PEFT on-disk state-dict layout.
This follows the original adapter export flow as closely as possible: trust the exported HF names, write normal 2D LoRA tensors directly under
base_model.model.*, and only special-case 3D tensors because PEFT stores packed parameter targets throughParamWrapper.
- bridge.models.conversion.peft_bridge.infer_rank_pattern_from_adapter_weights(
- adapter_weights: Iterable[megatron.bridge.models.conversion.model_bridge.HFWeightTuple],
- *,
- default_rank: int,
Infer PEFT
rank_patternentries from exported adapter tensors.
- bridge.models.conversion.peft_bridge.build_adapter_config_dict(
- peft_config: megatron.bridge.peft.base.PEFT,
- target_modules: List[str],
- target_parameters: Optional[List[str]] = None,
- base_model_name_or_path: Optional[str] = None,
- rank_pattern: Optional[Dict[str, int]] = None,
Build an HF PEFT-compatible
adapter_config.jsondictionary.The returned dict can be serialised directly with
json.dumpand is loadable bypeft.PeftModel.from_pretrainedwithout any runtime dependency on thepeftpip package.