`bridge.peft.lora`#

Module Contents#

Classes#

`LoRA`	Implements the LoRA (Low-Rank Adaptation) module for parameter-efficient fine-tuning.
`VLMLoRA`	Implements the LoRA for Vision-Language Models. VLMLoRA additionally allows the user to specify whether the language or vision models should be frozen. For example, a common finetuning workload for multimodal models is to apply adapters to language model and fully finetune the vision model.

Data#

logger

API#

bridge.peft.lora.logger#: ‘getLogger(…)’

class bridge.peft.lora.LoRA#

Bases: megatron.bridge.peft.base.PEFT, megatron.bridge.peft.module_matcher.ModuleMatcher

Implements the LoRA (Low-Rank Adaptation) module for parameter-efficient fine-tuning.

LoRA uses a low-rank projection to adapt the weights of a pre-trained model to a new downstream task. This class facilitates the application of LoRA to specific modules within the model architecture.

Parameters:

target_modules (List[str], optional) – A list of module names to apply LoRA to. Defaults to all linear layers [‘linear_qkv’, ‘linear_proj’, ‘linear_fc1’, ‘linear_fc2’]. - ‘linear_qkv’: Apply LoRA to the fused linear layer used for query, key, and value projections in self-attention. - ‘linear_proj’: Apply LoRA to the linear layer used for projecting the output of self-attention. - ‘linear_fc1’: Apply LoRA to the first fully-connected layer in MLP. - ‘linear_fc2’: Apply LoRA to the second fully-connected layer in MLP. Target modules can also contain wildcards. For example, you can specify target_modules=[’.layers.0..linear_qkv’, ‘.layers.1..linear_qkv’] to add LoRA to only linear_qkv on the first two layers.
exclude_modules (List[str], optional) – A list of module names not to apply LoRa to. It will match all nn.Linear & nn.Linear-adjacent modules whose name does not match any string in exclude_modules. If used, will require target_modules to be empty list or None.
dim (int) – Dimension of the low-rank projection space. Defaults to 32.
alpha (int) – Weighting factor for the low-rank projection. Defaults to 32.
dropout (float) – Dropout rate for the low-rank projection. Defaults to 0.0.
dropout_position (Literal['pre', 'post'], optional) – Position for applying dropout. Can be ‘pre’ (before the low-rank projection) or ‘post’ (after). Defaults to ‘pre’.
sequence_parallel_input_regather (bool) – Reduce retained activation memory for eligible column-parallel LoRA-A projections. The full LayerNorm output consumed by LoRA-A is gathered temporarily in forward, released after the LoRA-A computation, and gathered again during backward for the LoRA-A weight gradient. MCore overlaps the backward all-gather with dgrad computation when possible. This has no effect when sequence parallelism is disabled and falls back for unsupported adapters or overlapping activation recompute. Defaults to False.
a2a_experimental (bool) – Enables the experimental All-to-All (A2A) communication strategy. Defaults to False.
lora_A_init_method (str) – Initialization method for the low-rank matrix A. Defaults to “xavier”.
lora_B_init_method (str) – Initialization method for the low-rank matrix B. Defaults to “zero”.
lora_dtype (torch.dtype) – Parameter data type for LoRA weights. Default None (will use model’s dtype).
normalize_moe_lora (bool) – When True, expert linear layers use dim // moe_router_topk as the LoRA rank while non-expert layers keep the full dim. This normalizes the total adapter capacity for MoE models so it is comparable to a dense model. Defaults to False.
share_expert_adapters (bool) – When True, grouped MoE expert linears share one adapter across all local experts on the EP rank. Set to False to create one adapter per local expert instead. Defaults to True.
experts_shared_outer_loras (bool) –
When True, grouped-expert LoRA (TE*ParallelGroupedLinear base modules) uses

class:

SharedOuterGroupedExpertAdapter — gate_up lora_A and down lora_B are shared across experts (expert_dim=1), matching SGLang’s experts_shared_outer_loras=True serving contract (PR #21466). Default False preserves the adapter layout selected by share_expert_adapters.

target_modules: List[str]#: ‘field(…)’

dim: int#: 32

alpha: int#: 32

dropout: float#: 0.0

dropout_position: Literal[pre, post]#: ‘pre’

sequence_parallel_input_regather: bool#: False

lora_A_init_method: str#: ‘xavier’

lora_B_init_method: str#: ‘zero’

a2a_experimental: bool#: False

lora_dtype: torch.dtype#: None

normalize_moe_lora: bool#: False

share_expert_adapters: bool#: True

experts_shared_outer_loras: bool#: False

transform( module: torch.nn.Module, name: Optional[str] = None, prefix: Optional[str] = None, ) → torch.nn.Module#

Applies LoRA to a specific module within the model architecture.

Parameters:

m (nn.Module) – The module to apply LoRA to.
name (str, optional) – Name of the module (if applicable). Defaults to None.
prefix (str, optional) – Prefix for the module name (if applicable). Defaults to None.

Returns:

The modified module with LoRA applied, or the original module if not a target.

Return type:

nn.Module

class bridge.peft.lora.VLMLoRA#

Bases: bridge.peft.lora.LoRA

Implements the LoRA for Vision-Language Models. VLMLoRA additionally allows the user to specify whether the language or vision models should be frozen. For example, a common finetuning workload for multimodal models is to apply adapters to language model and fully finetune the vision model.

freeze_vision_model: bool#: True

freeze_vision_projection: bool#: True

freeze_language_model: bool#: True

freeze_model(model: torch.nn.Module, training: bool = True) → None#

bridge.peft.lora#

Module Contents#

Classes#

Data#

API#

`bridge.peft.lora`#