nemo_automodel.components._peft.lora_moe#
Module Contents#
Classes#
GroupedExperts + LoRA. |
|
GroupedExpertsDeepEP + LoRA. |
Functions#
SwiGLU expert activation with LoRA injection. |
|
QuickGEGLU expert activation with LoRA injection. |
|
Get the expert activation function with LoRA support. |
API#
- nemo_automodel.components._peft.lora_moe.swiglu_with_lora(
- x,
- *,
- gate_and_up_proj,
- down_proj,
- lora_gate_and_up_A,
- lora_gate_and_up_B,
- lora_down_A,
- lora_down_B,
- scale,
- gate_up_proj_bias=None,
- down_proj_bias=None,
SwiGLU expert activation with LoRA injection.
- nemo_automodel.components._peft.lora_moe.quick_geglu_with_lora(
- x,
- *,
- gate_and_up_proj,
- down_proj,
- lora_gate_and_up_A,
- lora_gate_and_up_B,
- lora_down_A,
- lora_down_B,
- scale,
- gate_up_proj_bias=None,
- down_proj_bias=None,
- alpha: float = 1.702,
- limit: float | None = 7.0,
QuickGEGLU expert activation with LoRA injection.
- nemo_automodel.components._peft.lora_moe.get_expert_activation_with_lora(config)#
Get the expert activation function with LoRA support.
- class nemo_automodel.components._peft.lora_moe.GroupedExpertsLoRA(
- orig_module: nemo_automodel.components.moe.layers.GroupedExperts,
- lora_dim=8,
- alpha=32,
- lora_A_init_method='xavier',
- lora_dtype=None,
Bases:
nemo_automodel.components.moe.layers.GroupedExpertsGroupedExperts + LoRA.
This class wraps
GroupedExpertsto apply LoRA to the expert weights... attribute:: lora_dim
Rank of the LoRA adapter.
- Type:
int
.. attribute:: scale
Scaling factor for the LoRA adapter (alpha / dim).
- Type:
float
.. attribute:: lora_gate_and_up_A
LoRA A matrix for gate and up projections.
- Type:
nn.Parameter
.. attribute:: lora_gate_and_up_B
LoRA B matrix for gate and up projections.
- Type:
nn.Parameter
.. attribute:: lora_down_A
LoRA A matrix for down projection.
- Type:
nn.Parameter
.. attribute:: lora_down_B
LoRA B matrix for down projection.
- Type:
nn.Parameter
Initialization
Initializes the GroupedExpertsLoRA module.
- Parameters:
orig_module (GroupedExperts) – The original module to wrap.
lora_dim (int) – Rank of the LoRA adapter.
alpha (int) – Scaling factor for the LoRA adapter.
lora_A_init_method (str) – Initialization method for LoRA A matrix.
lora_dtype (torch.dtype) – Data type for LoRA weights.
- static _init_adapter(
- obj,
- lora_dim=8,
- alpha=32,
- lora_A_init_method='xavier',
- lora_dtype=None,
- init_lora_weights(init_method)#
Initialize LoRA weights.
IMPORTANT: This method is called by the PEFT framework’s
_init_peft_adaptersafter the model is materialized from meta device to the target device. The method name is critical - it serves as a hook for the framework. Do not rename or remove this method.- Parameters:
init_method (str) – Initialization method (‘xavier’ or ‘kaiming’).
- forward(
- x: torch.Tensor,
- token_mask: torch.Tensor,
- weights: torch.Tensor,
- indices: torch.Tensor,
Forward pass for GroupedExpertsLoRA with LoRA injection.
This method duplicates the logic from GroupedExperts.forward but injects LoRA computations into the expert processing. This is necessary because the original forward doesn’t expose hooks for the inner expert computation.
- Parameters:
x (torch.Tensor) – Input tensor. Shape is [num_tokens, model_dim].
token_mask (torch.Tensor) – Boolean mask indicating valid tokens. Shape is [num_tokens].
weights (torch.Tensor) – Routing weights for the selected experts. Shape is [num_tokens, num_activated_experts].
indices (torch.Tensor) – Indices of the selected experts. Shape is [num_tokens, num_activated_experts].
- Returns:
Output tensor after expert computation with LoRA. Shape is [num_tokens, model_dim].
- Return type:
torch.Tensor
- class nemo_automodel.components._peft.lora_moe.GroupedExpertsDeepEPLoRA(
- orig_module: nemo_automodel.components.moe.layers.GroupedExpertsDeepEP,
- lora_dim=8,
- alpha=32,
- lora_A_init_method='xavier',
- lora_dtype=None,
Bases:
nemo_automodel.components.moe.layers.GroupedExpertsDeepEPGroupedExpertsDeepEP + LoRA.
This class wraps
GroupedExpertsDeepEPto apply LoRA to the expert weights using DeepEP kernels... attribute:: lora_dim
Rank of the LoRA adapter.
- Type:
int
.. attribute:: scale
Scaling factor for the LoRA adapter (alpha / dim).
- Type:
float
.. attribute:: lora_gate_and_up_A
LoRA A matrix for gate and up projections.
- Type:
nn.Parameter
.. attribute:: lora_gate_and_up_B
LoRA B matrix for gate and up projections.
- Type:
nn.Parameter
.. attribute:: lora_down_A
LoRA A matrix for down projection.
- Type:
nn.Parameter
.. attribute:: lora_down_B
LoRA B matrix for down projection.
- Type:
nn.Parameter
Initialization
Initializes the GroupedExpertsDeepEPLoRA module.
- Parameters:
orig_module (GroupedExpertsDeepEP) – The original module to wrap.
lora_dim (int) – Rank of the LoRA adapter.
alpha (int) – Scaling factor for the LoRA adapter.
lora_A_init_method (str) – Initialization method for LoRA A matrix.
lora_dtype (torch.dtype) – Data type for LoRA weights.
- static _init_adapter(
- obj,
- lora_dim=8,
- alpha=32,
- lora_A_init_method='xavier',
- lora_dtype=None,
- init_lora_weights(init_method)#
Initialize LoRA weights.
IMPORTANT: This method is called by the PEFT framework’s
_init_peft_adaptersafter the model is materialized from meta device to the target device. The method name is critical - it serves as a hook for the framework. Do not rename or remove this method.- Parameters:
init_method (str) – Initialization method (‘xavier’ or ‘kaiming’).
- forward(
- x: torch.Tensor,
- token_mask: torch.Tensor,
- weights: torch.Tensor,
- indices: torch.Tensor,
Forward pass for GroupedExpertsDeepEPLoRA with LoRA injection.
- Parameters:
x (torch.Tensor) – Input tensor. Shape is [num_tokens, model_dim].
token_mask (torch.Tensor) – Boolean mask indicating valid tokens. Shape is [num_tokens].
weights (torch.Tensor) – Routing weights for the selected experts. Shape is [num_tokens, num_activated_experts].
indices (torch.Tensor) – Indices of the selected experts. Shape is [num_tokens, num_activated_experts].
- Returns:
Output tensor after expert computation with LoRA. Shape is [num_tokens, model_dim].
- Return type:
torch.Tensor