`nemo_automodel.components._peft.lora`#

Module Contents#

Classes#

`PeftConfig`
`LinearLoRA`	Linear + LoRA, maintains ckpts structure (i.e. Linear’s weight/bias remain at the same FQN).
`TritonLinearLoRA`	Subclass of LinearLoRA that uses triton kernels for forward and backward passes.
`LoRATritonFunction`	Autograd function that calls the triton kernel wrappers for the LoRA forward and backward passes.

Functions#

`patch_linear_module`	Monkey-patches a nn.Linear (orig_linear param) to be a LinearLoRA.
`apply_lora_to_linear_modules`	Replace selected nn.Linear layers with LinearLoRA layers (in-place).

API#

class nemo_automodel.components._peft.lora.PeftConfig#

target_modules: list#: ‘field(…)’

exclude_modules: list#: ‘field(…)’

match_all_linear: bool#: False

dim: int#: 8

alpha: int#: 32

dropout: float#: 0.0

dropout_position: Literal[pre, post]#: ‘post’

lora_A_init: str#: ‘xavier’

lora_dtype: Optional[torch.dtype]#: None

use_triton: bool#: False

to_dict()#

classmethod from_dict(d: dict[str, Any])#

class nemo_automodel.components._peft.lora.LinearLoRA( orig_linear, dim=8, alpha=32, dropout=0.0, dropout_position='post', lora_A_init_method='xavier', lora_dtype=None, )#

Bases: torch.nn.Linear

Linear + LoRA, maintains ckpts structure (i.e. Linear’s weight/bias remain at the same FQN).

The _init_wrapper and _forward methods provide the LoRA functionality. We want to be able to use those inside LinearLoRA but also for monkey-patching modules, without repeating the same code -> therefore those are decorated with @staticmethod.

Initialization

LinearLora constructor.

Parameters:

orig_linear (nn.Module) – the linear module to augment.
dim (int) – lora’s dim in_features -> dim -> out_features.
alpha (int) – lora’s scaling alpha.
dropout (float) – dropout prob (default: 0.0).
dropout_position (str) – where to apply dropout rel. to lora (choices= [‘pre’, ‘post’], default=post)
lora_A_init_method (str) – init method for lora_A (choices= [‘xavier’, ‘uniform’])
lora_dtype (torch.dtype) – weight’s dtype, by default will use orig_linear’s but if they
weights (are quantized)

init_lora_weights(init_method: str)#

Initialize the LoRA weights.

Parameters:: init_method (str) – Method to initialize the LoRA weights.

static _init_adapter( obj, dim=8, alpha=32, dropout=0.0, dropout_position='post', lora_A_init_method='xavier', lora_dtype=None, )#

Adds LoRA weights to obj. Obj is either a LinearLoRA or an nn.Module (when monkey-patching).

Parameters:

obj (LinearLoRA | nn.Module) – input module to adapt.
dim (int) – lora’s dim in_features -> dim -> out_features.
alpha (int) – lora’s scaling alpha.
dropout (float) – dropout prob (default: 0.0).
dropout_position (str) – where to apply dropout rel. to lora (choices= [‘pre’, ‘post’], default=post)
lora_A_init_method (str) – init method for lora_A (choices= [‘xavier’, ‘uniform’])
lora_dtype (torch.dtype) – weight’s dtype, by default will use orig_linear’s but if they
weights (are quantized)

forward(x)#

Forward pass through the original linear layer augmented with the LoRA pathway.

Applies LoRA either before or after the dropout, depending on the configuration. The result of the original linear transformation is combined with the LoRA output.

Parameters:: x (Tensor) – Input tensor of shape (batch_size, in_features).
Returns:: Output tensor of shape (batch_size, out_features).
Return type:: Tensor

class nemo_automodel.components._peft.lora.TritonLinearLoRA( orig_linear, dim=8, alpha=32, dropout=0.0, dropout_position='post', lora_A_init_method='xavier', lora_dtype=None, )#

Bases: nemo_automodel.components._peft.lora.LinearLoRA

Subclass of LinearLoRA that uses triton kernels for forward and backward passes.

Parameters:

orig_linear (nn.Module) – the linear module to augment.
dim (int) – lora’s dim in_features -> dim -> out_features.
alpha (int) – lora’s scaling alpha.
dropout (float) – dropout prob (default: 0.0).
dropout_position (str) – where to apply dropout rel. to lora (choices= [‘pre’, ‘post’], default=post)
lora_A_init_method (str) – init method for lora_A (choices= [‘xavier’, ‘uniform’])
lora_dtype (torch.dtype) – weight’s dtype, by default will use orig_linear’s but if they
weights (are quantized)

Initialization

LinearLora constructor.

Parameters:

orig_linear (nn.Module) – the linear module to augment.
dim (int) – lora’s dim in_features -> dim -> out_features.
alpha (int) – lora’s scaling alpha.
dropout (float) – dropout prob (default: 0.0).
dropout_position (str) – where to apply dropout rel. to lora (choices= [‘pre’, ‘post’], default=post)
lora_A_init_method (str) – init method for lora_A (choices= [‘xavier’, ‘uniform’])
lora_dtype (torch.dtype) – weight’s dtype, by default will use orig_linear’s but if they
weights (are quantized)

forward(x)#

Forward function for LoRA with triton kernels.

Parameters:: x (torch.Tensor) – the input tensor.
Returns:: the output tensor.
Return type:: torch.Tensor

nemo_automodel.components._peft.lora.patch_linear_module( orig_linear, dim=8, alpha=32, dropout=0.0, dropout_position='post', lora_A_init_method='xavier', lora_dtype=None, use_triton=True, )#

Monkey-patches a nn.Linear (orig_linear param) to be a LinearLoRA.

The orig_linear might not contain valid weights, for example, the given orig_linear was initialized within a context-manager that uses a “meta” device. Therefore, we cannot copy the weight/bias from the orig_linear to the LinearLoRA, since those have not been allocated,

To circumvent this scenario, LinearLoRA’s additional functionality (_init_adapter, _forward) is based on static functions, so that we can use them for patching or when allocating a new LinearLoRA object.

Parameters:

orig_linear (nn.Linear) – the module we add adapter to.
dim (int, optional) – Lora dim. Defaults to 8.
alpha (int, optional) – Lora alpha scale. Defaults to 32.
dropout (float, optional) – dropout prob. Defaults to 0.0.
dropout_position (str, optional) – location to apply dropout wrt lora. Defaults to ‘post’ (choices: ‘pre’, ‘post’).
lora_A_init_method (str, optional) – lora_a init method. Defaults to ‘xavier’.
lora_dtype (type, optional) – Lora weights’ dtype. By default will use orig_linear’s dtype
dtype (but orig_linear might use non-trainable)
None. (specify the dtype manually. Defaults to)
use_triton (bool, optional) – By default we use the triton kernel LoRA implementation.

Returns:

the monkey-patched (nn.Linear + LoRA) nn.Module

Return type:

(nn.Module)

nemo_automodel.components._peft.lora.apply_lora_to_linear_modules( model: torch.nn.Module, peft_config: nemo_automodel.components._peft.lora.PeftConfig, quantization_config=None, ) → int#

Replace selected nn.Linear layers with LinearLoRA layers (in-place).

Parameters:

model – The model to apply LoRA to.
peft_config – PEFT configuration for LoRA parameters.
quantization_config – Optional separate QLoRA quantization configuration.

Returns:

Number of modules that were modified with LoRA.

.. note:: target_modules accepts wildcard fragments, e.g. [“q_proj”, “k_proj”, “.fc.”].

class nemo_automodel.components._peft.lora.LoRATritonFunction#

Bases: torch.autograd.Function

Autograd function that calls the triton kernel wrappers for the LoRA forward and backward passes.

static setup_context(ctx, inputs, output)#: Stores context for LoRA backward pass.

static forward(x, lora_A, lora_B, scale, dtype)#

Forward method for LoRATriton.

Reshapes 3D tensors into 2D and then calls the triton kernel.

static backward(ctx, d_y)#

Backward method for LoRATriton.

Reshapes 3D tensors into 2D and then calls the kernels to update d_lora_a, d_lora_b, and dx.

nemo_automodel.components._peft.lora#

Module Contents#

Classes#

Functions#

API#

`nemo_automodel.components._peft.lora`#