bridge.peft.lora#
Module Contents#
Classes#
Implements the LoRA (Low-Rank Adaptation) module for parameter-efficient fine-tuning. |
|
Implements the LoRA for Vision-Language Models. VLMLoRA additionally allows the user to specify whether the language or vision models should be frozen. For example, a common finetuning workload for multimodal models is to apply adapters to language model and fully finetune the vision model. |
|
Tensor helper for merging LoRA adapter weights into base weights. |
Data#
API#
- bridge.peft.lora.logger#
‘getLogger(…)’
- class bridge.peft.lora.LoRA#
Bases:
megatron.bridge.peft.base.PEFT,megatron.bridge.peft.module_matcher.ModuleMatcherImplements the LoRA (Low-Rank Adaptation) module for parameter-efficient fine-tuning.
LoRA uses a low-rank projection to adapt the weights of a pre-trained model to a new downstream task. This class facilitates the application of LoRA to specific modules within the model architecture.
- Parameters:
target_modules (List[str], optional) – A list of module names to apply LoRA to. Defaults to all linear layers [‘linear_qkv’, ‘linear_proj’, ‘linear_fc1’, ‘linear_fc2’]. - ‘linear_qkv’: Apply LoRA to the fused linear layer used for query, key, and value projections in self-attention. - ‘linear_proj’: Apply LoRA to the linear layer used for projecting the output of self-attention. - ‘linear_fc1’: Apply LoRA to the first fully-connected layer in MLP. - ‘linear_fc2’: Apply LoRA to the second fully-connected layer in MLP. Target modules can also contain wildcards. For example, you can specify target_modules=[’.layers.0..linear_qkv’, ‘.layers.1..linear_qkv’] to add LoRA to only linear_qkv on the first two layers.
exclude_modules (List[str], optional) – A list of module names not to apply LoRa to. It will match all nn.Linear & nn.Linear-adjacent modules whose name does not match any string in exclude_modules. If used, will require target_modules to be empty list or None.
dim (int) – Dimension of the low-rank projection space. Defaults to 32.
alpha (int) – Weighting factor for the low-rank projection. Defaults to 32.
dropout (float) – Dropout rate for the low-rank projection. Defaults to 0.0.
dropout_position (Literal['pre', 'post'], optional) – Position for applying dropout. Can be ‘pre’ (before the low-rank projection) or ‘post’ (after). Defaults to ‘pre’.
a2a_experimental (bool) – Enables the experimental All-to-All (A2A) communication strategy. Defaults to False.
lora_A_init_method (str) – Initialization method for the low-rank matrix A. Defaults to “xavier”.
lora_B_init_method (str) – Initialization method for the low-rank matrix B. Defaults to “zero”.
lora_dtype (torch.dtype) – Parameter data type for LoRA weights. Default None (will use model’s dtype).
normalize_moe_lora (bool) – When True, expert linear layers use dim // moe_router_topk as the LoRA rank while non-expert layers keep the full dim. This normalizes the total adapter capacity for MoE models so it is comparable to a dense model. Defaults to False.
share_expert_adapters (bool) – When True, grouped MoE expert linears share one adapter across all local experts on the EP rank. Set to False to create one adapter per local expert instead. Defaults to True.
- target_modules: List[str]#
‘field(…)’
- dim: int#
32
- alpha: int#
32
- dropout: float#
0.0
- dropout_position: Literal[pre, post]#
‘pre’
- lora_A_init_method: str#
‘xavier’
- lora_B_init_method: str#
‘zero’
- a2a_experimental: bool#
False
- lora_dtype: torch.dtype#
None
- normalize_moe_lora: bool#
False
True
- transform(
- module: torch.nn.Module,
- name: Optional[str] = None,
- prefix: Optional[str] = None,
Applies LoRA to a specific module within the model architecture.
- Parameters:
m (nn.Module) – The module to apply LoRA to.
name (str, optional) – Name of the module (if applicable). Defaults to None.
prefix (str, optional) – Prefix for the module name (if applicable). Defaults to None.
- Returns:
The modified module with LoRA applied, or the original module if not a target.
- Return type:
nn.Module
- class bridge.peft.lora.VLMLoRA#
Bases:
bridge.peft.lora.LoRAImplements the LoRA for Vision-Language Models. VLMLoRA additionally allows the user to specify whether the language or vision models should be frozen. For example, a common finetuning workload for multimodal models is to apply adapters to language model and fully finetune the vision model.
- freeze_vision_model: bool#
True
- freeze_vision_projection: bool#
True
- freeze_language_model: bool#
True
- freeze_model(model: torch.nn.Module, training: bool = True) None#
- class bridge.peft.lora.LoRAMerge#
Tensor helper for merging LoRA adapter weights into base weights.
- merge(
- base_weight: torch.Tensor,
- linear_out: torch.Tensor,
- linear_in: torch.Tensor,
- alpha: int,
- dim: int,
- *,
- tp_size: int,
- tp_group,
Merges the LoRA adapter weights with the base model weights. Handles tensor parallelism by gathering sharded dimensions.
For ColumnParallelLinear (e.g., linear_qkv, linear_fc1): - base_weight: (out_features/TP, in_features) - linear_in: (dim/TP, in_features) ← Need to gather this - linear_out: (out_features/TP, dim) - Target: (out_features/TP, dim) @ (dim, in_features) = (out_features/TP, in_features)
For RowParallelLinear (e.g., linear_proj, linear_fc2): - base_weight: (out_features, in_features/TP) - linear_in: (dim, in_features/TP) - linear_out: (out_features/TP, dim) ← Need to gather this - Target: (out_features, dim) @ (dim, in_features/TP) = (out_features, in_features/TP)
- Parameters:
base_weight (torch.Tensor) – The base model weights.
linear_out (torch.Tensor) – LoRA’s B matrix.
linear_in (torch.Tensor) – LoRA’s A matrix.
alpha (int) – Weighting factor for the low-rank projection.
dim (int) – Dimension of the low-rank projection space.
tp_size (int) – Tensor-parallel world size for the adapter shard.
tp_group – Tensor-parallel process group for the adapter shard.
- Returns:
The merged weights.
- Return type:
torch.Tensor