Parameter Efficient Fine-tuning, is a new open-source library from Hugging Face to enable efficient adaptation of pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model’s parameters (View Highlight)
QLoRA is a new technique to reduce the memory footprint of large language models during finetuning, without sacrificing performance. The TL;DR; of how QLoRA works is:
• Quantize the pretrained model to 4 bits and freezing it.
• Attach small, trainable adapter layers. (LoRA)
• Finetune only the adapter layers, while using the frozen quantized model for context. (View Highlight)