nemo_automodel.components._peft.lora_experts#
Module Contents#
Classes#
GroupedExperts + LoRA. |
|
GroupedExpertsDeepEP + LoRA. |
Functions#
Convert DTensor to local tensor, or return as-is. |
API#
- nemo_automodel.components._peft.lora_experts._to_local(proj)#
Convert DTensor to local tensor, or return as-is.
- class nemo_automodel.components._peft.lora_experts.GroupedExpertsLoRA(
- orig_module: nemo_automodel.components.moe.experts.GroupedExperts,
- lora_dim=8,
- alpha=32,
- lora_A_init_method='xavier',
- lora_dtype=None,
Bases:
nemo_automodel.components.moe.experts.GroupedExpertsGroupedExperts + LoRA.
This class wraps
GroupedExpertsto apply LoRA to the expert weights... attribute:: lora_dim
Rank of the LoRA adapter.
- Type:
int
.. attribute:: scale
Scaling factor for the LoRA adapter (alpha / dim).
- Type:
float
.. attribute:: lora_gate_and_up_A
LoRA A matrix for gate and up projections.
- Type:
nn.Parameter
.. attribute:: lora_gate_and_up_B
LoRA B matrix for gate and up projections.
- Type:
nn.Parameter
.. attribute:: lora_down_A
LoRA A matrix for down projection.
- Type:
nn.Parameter
.. attribute:: lora_down_B
LoRA B matrix for down projection.
- Type:
nn.Parameter
Initialization
Initializes the GroupedExperts module.
- Parameters:
config â MoE configuration containing expert parameters.
backend â Backend configuration. When backend.experts == âtorch_mmâ, uses torch._grouped_mm instead of per-expert loop.
- static _init_adapter(
- obj,
- lora_dim=8,
- alpha=32,
- lora_A_init_method='xavier',
- lora_dtype=None,
- init_lora_weights(init_method)#
Initialize LoRA weights.
IMPORTANT: This method is called by the PEFT frameworkâs
_init_peft_adaptersafter the model is materialized from meta device to the target device. The method name is critical - it serves as a hook for the framework. Do not rename or remove this method.- Parameters:
init_method (str) â Initialization method (âxavierâ or âkaimingâ).
- forward(
- x: torch.Tensor,
- token_mask: torch.Tensor,
- weights: torch.Tensor,
- indices: torch.Tensor,
Forward pass for GroupedExpertsLoRA with LoRA injection.
Mirrors GroupedExperts.forward but injects LoRA computations into the expert processing at the projection level.
- _forward_loop(
- x,
- weights,
- indices,
- token_mask,
- gate_and_up_projs,
- down_projs,
- lora_gate_and_up_A,
- lora_gate_and_up_B,
- lora_down_A,
- lora_down_B,
- n_local_experts,
- experts_start_idx,
- experts_end_idx,
Per-expert loop forward path with LoRA injection.
- _forward_grouped_mm(
- x,
- token_mask,
- weights,
- indices,
- gate_and_up_projs,
- down_projs,
- lora_gate_and_up_A,
- lora_gate_and_up_B,
- lora_down_A,
- lora_down_B,
- n_local_experts,
- experts_start_idx,
Grouped GEMM forward path with LoRA injection using torch._grouped_mm.
- class nemo_automodel.components._peft.lora_experts.GroupedExpertsDeepEPLoRA(
- orig_module: nemo_automodel.components.moe.experts.GroupedExpertsDeepEP,
- lora_dim=8,
- alpha=32,
- lora_A_init_method='xavier',
- lora_dtype=None,
Bases:
nemo_automodel.components.moe.experts.GroupedExpertsDeepEPGroupedExpertsDeepEP + LoRA.
This class wraps
GroupedExpertsDeepEPto apply LoRA to the expert weights using DeepEP kernels... attribute:: lora_dim
Rank of the LoRA adapter.
- Type:
int
.. attribute:: scale
Scaling factor for the LoRA adapter (alpha / dim).
- Type:
float
.. attribute:: lora_gate_and_up_A
LoRA A matrix for gate and up projections.
- Type:
nn.Parameter
.. attribute:: lora_gate_and_up_B
LoRA B matrix for gate and up projections.
- Type:
nn.Parameter
.. attribute:: lora_down_A
LoRA A matrix for down projection.
- Type:
nn.Parameter
.. attribute:: lora_down_B
LoRA B matrix for down projection.
- Type:
nn.Parameter
Initialization
Initializes the GroupedExperts module.
- Parameters:
config â MoE configuration containing expert parameters.
backend â Backend configuration. When backend.experts == âtorch_mmâ, uses torch._grouped_mm; otherwise uses grouped_gemm.ops.gmm.
- static _init_adapter(
- obj,
- lora_dim=8,
- alpha=32,
- lora_A_init_method='xavier',
- lora_dtype=None,
- init_lora_weights(init_method)#
Initialize LoRA weights.
IMPORTANT: This method is called by the PEFT frameworkâs
_init_peft_adaptersafter the model is materialized from meta device to the target device. The method name is critical - it serves as a hook for the framework. Do not rename or remove this method.- Parameters:
init_method (str) â Initialization method (âxavierâ or âkaimingâ).
- forward(
- x: torch.Tensor,
- token_mask: torch.Tensor,
- weights: torch.Tensor,
- indices: torch.Tensor,
Forward pass for GroupedExpertsDeepEPLoRA with LoRA injection.
Mirrors GroupedExpertsDeepEP.forward but injects LoRA computations into the expert processing at the projection level.