nemo_microservices.types.customization.lora_parameters#

Module Contents#

Classes#

API#

class nemo_microservices.types.customization.lora_parameters.LoraParameters(/, **data: Any)#

Bases: nemo_microservices._models.BaseModel

adapter_dim: int | None#

None

Size of adapter layers added throughout the model.

This is the size of the tunable layers that LoRA adds to various transformer blocks in the base model. This parameter is a power of 2.

adapter_dropout: float | None#

None

Dropout probability in the adapter layer.

alpha: int | None#

None

Scaling factor for the LoRA update.

Controls the magnitude of the low-rank approximation. A higher alpha value increases the impact of the LoRA weights, effectively amplifying the changes made to the original model. Proper tuning of alpha is essential, as it balances the adaptation’s impact, ensuring neither underfitting nor overfitting. This is often a multiple of Adapter Dimension

target_modules: List[str] | None#

None

Target specific layers in the model architecture to apply LoRA.

We select a subset of the layers by default. However, specific layers can also be selected. For example:

  • linear_qkv: Apply LoRA to the fused linear layer used for query, key, and value projections in self-attention.

  • linear_proj: Apply LoRA to the linear layer used for projecting the output of self-attention.

  • linear_fc1: Apply LoRA to the first fully-connected layer in MLP.

  • linear_fc2: Apply LoRA to the second fully-connected layer in MLP.

  • *_proj: Apply LoRA to all layers used for projecting the output of self-attention. Target modules can also contain wildcards. For example, you can specify target_modules=['*.layers.0.*.linear_qkv', '*.layers.1.*.linear_qkv'] to add LoRA to only linear_qkv on the first two layers.

Our framework only supports a Fused LoRA implementation, Cannonical LoRA is not supported.