Skip to main content

LORA AND QLORA

 

LORA:


The LORA works as shown in the above image. The model weights at the self-attention layer are updated as shown in the above image. A further in-depth explanation of LORA will be done in an individual block.

The memory and training time saved by the LORA technique can be seen in the below image


LORA enables the Different task adaptation models as below :


As described in the image, we can do this for multiple tasks and just easily switch between the tasks based on requirements.

How decomposition is done:

SVD is one approach to finding the row rank matrix approximations. here is an example 


From the above decomposition if we want the rank 1 approximations we will only consider the first column in U and the first row in v transpose. And the results matrices will have dimensions of 2x1 and 1x3. 

As explained earlier the matrix is now trained based on these decomposed vectors and once trained they are multiplied to obtain a full-size matrix. This matrix is now added to the original matrix to achieve a fine-tuned model.

Results of using LORA:


As shown in the above image, the full fine-tuned model for summarization performed much better than the base model. But the full fine-tuning comes with large GPU and memory costs. On the other hand, the LORA model archives similar results as full-fine tuning but with very little memory and GPU resources.

How to choose Rank:

This is still an active area of Research and one of the research paper results is as follows:


As the rank increased there was a significant rise in scores but after a certain point, it simply formed a flattened curve.

QLORA:

It is the same method as LORA but before doing this decomposition we first try to quantize the parameters and then do the remaining process,

Comments