nemo_automodel.components._peft.lora_moe#

Module Contents#

Classes#

GroupedExpertsLoRA

GroupedExperts + LoRA.

GroupedExpertsDeepEPLoRA

GroupedExpertsDeepEP + LoRA.

Functions#

swiglu_with_lora

SwiGLU expert activation with LoRA injection.

quick_geglu_with_lora

QuickGEGLU expert activation with LoRA injection.

get_expert_activation_with_lora

Get the expert activation function with LoRA support.

API#

nemo_automodel.components._peft.lora_moe.swiglu_with_lora(
x,
*,
gate_and_up_proj,
down_proj,
lora_gate_and_up_A,
lora_gate_and_up_B,
lora_down_A,
lora_down_B,
scale,
gate_up_proj_bias=None,
down_proj_bias=None,
)#

SwiGLU expert activation with LoRA injection.

nemo_automodel.components._peft.lora_moe.quick_geglu_with_lora(
x,
*,
gate_and_up_proj,
down_proj,
lora_gate_and_up_A,
lora_gate_and_up_B,
lora_down_A,
lora_down_B,
scale,
gate_up_proj_bias=None,
down_proj_bias=None,
alpha: float = 1.702,
limit: float | None = 7.0,
)#

QuickGEGLU expert activation with LoRA injection.

nemo_automodel.components._peft.lora_moe.get_expert_activation_with_lora(config)#

Get the expert activation function with LoRA support.

class nemo_automodel.components._peft.lora_moe.GroupedExpertsLoRA(
orig_module: nemo_automodel.components.moe.layers.GroupedExperts,
lora_dim=8,
alpha=32,
lora_A_init_method='xavier',
lora_dtype=None,
)#

Bases: nemo_automodel.components.moe.layers.GroupedExperts

GroupedExperts + LoRA.

This class wraps GroupedExperts to apply LoRA to the expert weights.

.. attribute:: lora_dim

Rank of the LoRA adapter.

Type:

int

.. attribute:: scale

Scaling factor for the LoRA adapter (alpha / dim).

Type:

float

.. attribute:: lora_gate_and_up_A

LoRA A matrix for gate and up projections.

Type:

nn.Parameter

.. attribute:: lora_gate_and_up_B

LoRA B matrix for gate and up projections.

Type:

nn.Parameter

.. attribute:: lora_down_A

LoRA A matrix for down projection.

Type:

nn.Parameter

.. attribute:: lora_down_B

LoRA B matrix for down projection.

Type:

nn.Parameter

Initialization

Initializes the GroupedExpertsLoRA module.

Parameters:
  • orig_module (GroupedExperts) – The original module to wrap.

  • lora_dim (int) – Rank of the LoRA adapter.

  • alpha (int) – Scaling factor for the LoRA adapter.

  • lora_A_init_method (str) – Initialization method for LoRA A matrix.

  • lora_dtype (torch.dtype) – Data type for LoRA weights.

static _init_adapter(
obj,
lora_dim=8,
alpha=32,
lora_A_init_method='xavier',
lora_dtype=None,
)#
init_lora_weights(init_method)#

Initialize LoRA weights.

IMPORTANT: This method is called by the PEFT framework’s _init_peft_adapters after the model is materialized from meta device to the target device. The method name is critical - it serves as a hook for the framework. Do not rename or remove this method.

Parameters:

init_method (str) – Initialization method (‘xavier’ or ‘kaiming’).

forward(
x: torch.Tensor,
token_mask: torch.Tensor,
weights: torch.Tensor,
indices: torch.Tensor,
)#

Forward pass for GroupedExpertsLoRA with LoRA injection.

This method duplicates the logic from GroupedExperts.forward but injects LoRA computations into the expert processing. This is necessary because the original forward doesn’t expose hooks for the inner expert computation.

Parameters:
  • x (torch.Tensor) – Input tensor. Shape is [num_tokens, model_dim].

  • token_mask (torch.Tensor) – Boolean mask indicating valid tokens. Shape is [num_tokens].

  • weights (torch.Tensor) – Routing weights for the selected experts. Shape is [num_tokens, num_activated_experts].

  • indices (torch.Tensor) – Indices of the selected experts. Shape is [num_tokens, num_activated_experts].

Returns:

Output tensor after expert computation with LoRA. Shape is [num_tokens, model_dim].

Return type:

torch.Tensor

class nemo_automodel.components._peft.lora_moe.GroupedExpertsDeepEPLoRA(
orig_module: nemo_automodel.components.moe.layers.GroupedExpertsDeepEP,
lora_dim=8,
alpha=32,
lora_A_init_method='xavier',
lora_dtype=None,
)#

Bases: nemo_automodel.components.moe.layers.GroupedExpertsDeepEP

GroupedExpertsDeepEP + LoRA.

This class wraps GroupedExpertsDeepEP to apply LoRA to the expert weights using DeepEP kernels.

.. attribute:: lora_dim

Rank of the LoRA adapter.

Type:

int

.. attribute:: scale

Scaling factor for the LoRA adapter (alpha / dim).

Type:

float

.. attribute:: lora_gate_and_up_A

LoRA A matrix for gate and up projections.

Type:

nn.Parameter

.. attribute:: lora_gate_and_up_B

LoRA B matrix for gate and up projections.

Type:

nn.Parameter

.. attribute:: lora_down_A

LoRA A matrix for down projection.

Type:

nn.Parameter

.. attribute:: lora_down_B

LoRA B matrix for down projection.

Type:

nn.Parameter

Initialization

Initializes the GroupedExpertsDeepEPLoRA module.

Parameters:
  • orig_module (GroupedExpertsDeepEP) – The original module to wrap.

  • lora_dim (int) – Rank of the LoRA adapter.

  • alpha (int) – Scaling factor for the LoRA adapter.

  • lora_A_init_method (str) – Initialization method for LoRA A matrix.

  • lora_dtype (torch.dtype) – Data type for LoRA weights.

static _init_adapter(
obj,
lora_dim=8,
alpha=32,
lora_A_init_method='xavier',
lora_dtype=None,
)#
init_lora_weights(init_method)#

Initialize LoRA weights.

IMPORTANT: This method is called by the PEFT framework’s _init_peft_adapters after the model is materialized from meta device to the target device. The method name is critical - it serves as a hook for the framework. Do not rename or remove this method.

Parameters:

init_method (str) – Initialization method (‘xavier’ or ‘kaiming’).

forward(
x: torch.Tensor,
token_mask: torch.Tensor,
weights: torch.Tensor,
indices: torch.Tensor,
)#

Forward pass for GroupedExpertsDeepEPLoRA with LoRA injection.

Parameters:
  • x (torch.Tensor) – Input tensor. Shape is [num_tokens, model_dim].

  • token_mask (torch.Tensor) – Boolean mask indicating valid tokens. Shape is [num_tokens].

  • weights (torch.Tensor) – Routing weights for the selected experts. Shape is [num_tokens, num_activated_experts].

  • indices (torch.Tensor) – Indices of the selected experts. Shape is [num_tokens, num_activated_experts].

Returns:

Output tensor after expert computation with LoRA. Shape is [num_tokens, model_dim].

Return type:

torch.Tensor