PEFT documentation

LoRA

You are viewing v0.7.0 version. A newer version v0.14.0 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

LoRA

Low-Rank Adaptation (LoRA) is a PEFT method that decomposes a large matrix into two smaller low-rank matrices in the attention layers. This drastically reduces the number of parameters that need to be fine-tuned.

The abstract from the paper is:

We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained language models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up the pretraining stage and adapting the pretrained models to specific domains limit their practical use in rescoring. Here we present a method based on low-rank decomposition to train a rescoring BERT model and adapt it to new domains using only a fraction (0.08%) of the pretrained parameters. These inserted matrices are optimized through a discriminative training objective along with a correlation-based regularization loss. The proposed low-rank adaptation Rescore-BERT (LoRB) architecture is evaluated on LibriSpeech and internal datasets with decreased training times by factors between 5.4 and 3.6..

LoraConfig

class peft.LoraConfig

< >

( peft_type: typing.Union[str, peft.utils.peft_types.PeftType, NoneType] = None auto_mapping: typing.Optional[dict] = None base_model_name_or_path: typing.Optional[str] = None revision: typing.Optional[str] = None task_type: typing.Union[str, peft.utils.peft_types.TaskType, NoneType] = None inference_mode: bool = False r: int = 8 target_modules: Optional[Union[List[str], str]] = None lora_alpha: int = 8 lora_dropout: float = 0.0 fan_in_fan_out: bool = False bias: str = 'none' modules_to_save: Optional[List[str]] = None init_lora_weights: bool | Literal[('gaussian', 'loftq')] = True layers_to_transform: Optional[Union[List[int], int]] = None layers_pattern: Optional[Union[List[str], str]] = None rank_pattern: Optional[dict] = <factory> alpha_pattern: Optional[dict] = <factory> megatron_config: Optional[dict] = None megatron_core: Optional[str] = 'megatron.core' loftq_config: Union[LoftQConfig, dict] = <factory> )

Parameters

  • r (int) — Lora attention dimension.
  • target_modules (Union[List[str],str]) — The names of the modules to apply Lora to.
  • lora_alpha (int) — The alpha parameter for Lora scaling.
  • lora_dropout (float) — The dropout probability for Lora layers.
  • fan_in_fan_out (bool) — Set this to True if the layer to replace stores weight like (fan_in, fan_out). For example, gpt-2 uses Conv1D which stores weights like (fan_in, fan_out) and hence this should be set to True.
  • bias (str) — Bias type for Lora. Can be ‘none’, ‘all’ or ‘lora_only’. If ‘all’ or ‘lora_only’, the corresponding biases will be updated during training. Be aware that this means that, even when disabling the adapters, the model will not produce the same output as the base model would have without adaptation.
  • modules_to_save (List[str]) —List of modules apart from LoRA layers to be set as trainable and saved in the final checkpoint.
  • layers_to_transform (Union[List[int],int]) — The layer indexes to transform, if this argument is specified, it will apply the LoRA transformations on the layer indexes that are specified in this list. If a single integer is passed, it will apply the LoRA transformations on the layer at this index.
  • layers_pattern (str) — The layer pattern name, used only if layers_to_transform is different from None and if the layer pattern is not in the common layers pattern.
  • rank_pattern (dict) — The mapping from layer names or regexp expression to ranks which are different from the default rank specified by r.
  • alpha_pattern (dict) — The mapping from layer names or regexp expression to alphas which are different from the default alpha specified by lora_alpha.

This is the configuration class to store the configuration of a LoraModel.

LoraModel

class peft.LoraModel

< >

( model config adapter_name ) torch.nn.Module

Parameters

  • model (torch.nn.Module) — The model to be adapted.
  • config (LoraConfig) — The configuration of the Lora model.
  • adapter_name (str) — The name of the adapter, defaults to "default".

Returns

torch.nn.Module

The Lora model.

Creates Low Rank Adapter (LoRA) model from a pretrained transformers model.

The method is described in detail in https://arxiv.org/abs/2106.09685.

Example:

>>> from transformers import AutoModelForSeq2SeqLM
>>> from peft import LoraModel, LoraConfig

>>> config = LoraConfig(
...     task_type="SEQ_2_SEQ_LM",
...     r=8,
...     lora_alpha=32,
...     target_modules=["q", "v"],
...     lora_dropout=0.01,
... )

>>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
>>> lora_model = LoraModel(model, config, "default")
>>> import transformers
>>> from peft import LoraConfig, PeftModel, get_peft_model, prepare_model_for_int8_training

>>> target_modules = ["q_proj", "k_proj", "v_proj", "out_proj", "fc_in", "fc_out", "wte"]
>>> config = LoraConfig(
...     r=4, lora_alpha=16, target_modules=target_modules, lora_dropout=0.1, bias="none", task_type="CAUSAL_LM"
... )

>>> model = transformers.GPTJForCausalLM.from_pretrained(
...     "kakaobrain/kogpt",
...     revision="KoGPT6B-ryan1.5b-float16",  # or float32 version: revision=KoGPT6B-ryan1.5b
...     pad_token_id=tokenizer.eos_token_id,
...     use_cache=False,
...     device_map={"": rank},
...     torch_dtype=torch.float16,
...     load_in_8bit=True,
... )
>>> model = prepare_model_for_int8_training(model)
>>> lora_model = get_peft_model(model, config)

Attributes:

add_weighted_adapter

< >

( adapters weights adapter_name combination_type = 'svd' svd_rank = None svd_clamp = None svd_full_matrices = True svd_driver = None )

Parameters

  • adapters (list) — List of adapter names to be merged.
  • weights (list) — List of weights for each adapter.
  • adapter_name (str) — Name of the new adapter.
  • combination_type (str) — Type of merging. Can be one of [svd, linear, cat]. When using the cat combination_type you should be aware that rank of the resulting adapter will be equal to the sum of all adapters ranks. So it’s possible that the mixed adapter may become too big and result in OOM errors.
  • svd_rank (int, optional) — Rank of output adapter for svd. If None provided, will use max rank of merging adapters.
  • svd_clamp (float, optional) — A quantile threshold for clamping SVD decomposition output. If None is provided, do not perform clamping. Defaults to None.
  • svd_full_matrices (bool, optional) — Controls whether to compute the full or reduced SVD, and consequently, the shape of the returned tensors U and Vh. Defaults to True.
  • svd_driver (str, optional) — Name of the cuSOLVER method to be used. This keyword argument only works when merging on CUDA. Can be one of [None, gesvd, gesvdj, gesvda]. For more info please refer to torch.linalg.svd documentation. Defaults to None.

This method adds a new adapter by merging the given adapters with the given weights.

When using the cat combination_type you should be aware that rank of the resulting adapter will be equal to the sum of all adapters ranks. So it’s possible that the mixed adapter may become too big and result in OOM errors.

delete_adapter

< >

( adapter_name: str )

Parameters

  • adapter_name (str) — Name of the adapter to be deleted.

Deletes an existing adapter.

disable_adapter_layers

< >

( )

Disable all adapters.

When disabling all adapters, the model output corresponds to the output of the base model.

enable_adapter_layers

< >

( )

Enable all adapters.

Call this if you have previously disabled all adapters and want to re-enable them.

merge_and_unload

< >

( progressbar: bool = False safe_merge: bool = False adapter_names: Optional[List[str]] = None )

Parameters

  • progressbar (bool) — whether to show a progressbar indicating the unload and merge process
  • safe_merge (bool) — whether to activate the safe merging check to check if there is any potential Nan in the adapter weights
  • adapter_names (List[str], optional) — The list of adapter names that should be merged. If None, all active adapters will be merged. Defaults to None.

This method merges the LoRa layers into the base model. This is needed if someone wants to use the base model as a standalone model.

Example:

>>> from transformers import AutoModelForCausalLM
>>> from peft import PeftModel

>>> base_model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-40b")
>>> peft_model_id = "smangrul/falcon-40B-int4-peft-lora-sfttrainer-sample"
>>> model = PeftModel.from_pretrained(base_model, peft_model_id)
>>> merged_model = model.merge_and_unload()

set_adapter

< >

( adapter_name: str | list[str] )

Parameters

  • adapter_name (str or list[str]) — Name of the adapter(s) to be activated.

Set the active adapter(s).

unload

< >

( )

Gets back the base model by removing all the lora modules without merging. This gives back the original base model.