nemo_automodel.components._peft.lora_experts#

Module Contents#

Classes#

GroupedExpertsLoRA

GroupedExperts + LoRA.

GroupedExpertsDeepEPLoRA

GroupedExpertsDeepEP + LoRA.

Functions#

_to_local

Convert DTensor to local tensor, or return as-is.

API#

nemo_automodel.components._peft.lora_experts._to_local(proj)#

Convert DTensor to local tensor, or return as-is.

class nemo_automodel.components._peft.lora_experts.GroupedExpertsLoRA(
orig_module: nemo_automodel.components.moe.experts.GroupedExperts,
lora_dim=8,
alpha=32,
lora_A_init_method='xavier',
lora_dtype=None,
)#

Bases: nemo_automodel.components.moe.experts.GroupedExperts

GroupedExperts + LoRA.

This class wraps GroupedExperts to apply LoRA to the expert weights.

.. attribute:: lora_dim

Rank of the LoRA adapter.

Type:

int

.. attribute:: scale

Scaling factor for the LoRA adapter (alpha / dim).

Type:

float

.. attribute:: lora_gate_and_up_A

LoRA A matrix for gate and up projections.

Type:

nn.Parameter

.. attribute:: lora_gate_and_up_B

LoRA B matrix for gate and up projections.

Type:

nn.Parameter

.. attribute:: lora_down_A

LoRA A matrix for down projection.

Type:

nn.Parameter

.. attribute:: lora_down_B

LoRA B matrix for down projection.

Type:

nn.Parameter

Initialization

Initializes the GroupedExperts module.

Parameters:
  • config – MoE configuration containing expert parameters.

  • backend – Backend configuration. When backend.experts == “torch_mm”, uses torch._grouped_mm instead of per-expert loop.

static _init_adapter(
obj,
lora_dim=8,
alpha=32,
lora_A_init_method='xavier',
lora_dtype=None,
)#
init_lora_weights(init_method)#

Initialize LoRA weights.

IMPORTANT: This method is called by the PEFT framework’s _init_peft_adapters after the model is materialized from meta device to the target device. The method name is critical - it serves as a hook for the framework. Do not rename or remove this method.

Parameters:

init_method (str) – Initialization method (‘xavier’ or ‘kaiming’).

forward(
x: torch.Tensor,
token_mask: torch.Tensor,
weights: torch.Tensor,
indices: torch.Tensor,
)#

Forward pass for GroupedExpertsLoRA with LoRA injection.

Mirrors GroupedExperts.forward but injects LoRA computations into the expert processing at the projection level.

_forward_loop(
x,
weights,
indices,
token_mask,
gate_and_up_projs,
down_projs,
lora_gate_and_up_A,
lora_gate_and_up_B,
lora_down_A,
lora_down_B,
n_local_experts,
experts_start_idx,
experts_end_idx,
)#

Per-expert loop forward path with LoRA injection.

_forward_grouped_mm(
x,
token_mask,
weights,
indices,
gate_and_up_projs,
down_projs,
lora_gate_and_up_A,
lora_gate_and_up_B,
lora_down_A,
lora_down_B,
n_local_experts,
experts_start_idx,
)#

Grouped GEMM forward path with LoRA injection using torch._grouped_mm.

class nemo_automodel.components._peft.lora_experts.GroupedExpertsDeepEPLoRA(
orig_module: nemo_automodel.components.moe.experts.GroupedExpertsDeepEP,
lora_dim=8,
alpha=32,
lora_A_init_method='xavier',
lora_dtype=None,
)#

Bases: nemo_automodel.components.moe.experts.GroupedExpertsDeepEP

GroupedExpertsDeepEP + LoRA.

This class wraps GroupedExpertsDeepEP to apply LoRA to the expert weights using DeepEP kernels.

.. attribute:: lora_dim

Rank of the LoRA adapter.

Type:

int

.. attribute:: scale

Scaling factor for the LoRA adapter (alpha / dim).

Type:

float

.. attribute:: lora_gate_and_up_A

LoRA A matrix for gate and up projections.

Type:

nn.Parameter

.. attribute:: lora_gate_and_up_B

LoRA B matrix for gate and up projections.

Type:

nn.Parameter

.. attribute:: lora_down_A

LoRA A matrix for down projection.

Type:

nn.Parameter

.. attribute:: lora_down_B

LoRA B matrix for down projection.

Type:

nn.Parameter

Initialization

Initializes the GroupedExperts module.

Parameters:
  • config – MoE configuration containing expert parameters.

  • backend – Backend configuration. When backend.experts == “torch_mm”, uses torch._grouped_mm; otherwise uses grouped_gemm.ops.gmm.

static _init_adapter(
obj,
lora_dim=8,
alpha=32,
lora_A_init_method='xavier',
lora_dtype=None,
)#
init_lora_weights(init_method)#

Initialize LoRA weights.

IMPORTANT: This method is called by the PEFT framework’s _init_peft_adapters after the model is materialized from meta device to the target device. The method name is critical - it serves as a hook for the framework. Do not rename or remove this method.

Parameters:

init_method (str) – Initialization method (‘xavier’ or ‘kaiming’).

forward(
x: torch.Tensor,
token_mask: torch.Tensor,
weights: torch.Tensor,
indices: torch.Tensor,
)#

Forward pass for GroupedExpertsDeepEPLoRA with LoRA injection.

Mirrors GroupedExpertsDeepEP.forward but injects LoRA computations into the expert processing at the projection level.