Skip to content

btboilerplate/LORA-QLORA-on-Llama2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

🦙 LoRA & QLoRA Fine-Tuning Pipeline for LLaMA-2

Python Transformers PEFT

This project demonstrates the implementation of parameter-efficient fine-tuning (PEFT) techniques — LoRA and QLoRA — for adapting the LLaMA-2 large language model using memory-efficient training strategies.


📌 Project Overview

Large Language Models require significant computational resources for full fine-tuning.
This project explores modern efficient alternatives:

  • LoRA (Low-Rank Adaptation)
  • QLoRA (Quantized LoRA)

Both approaches reduce GPU memory usage while enabling model adaptation by training only a small number of additional parameters.

The notebooks focus on building a correct and reproducible fine-tuning pipeline using HuggingFace and PEFT libraries.


📁 Project Structure

  • Lora_Llama2.ipynb — LoRA fine-tuning setup for LLaMA-2
  • QLoRA_Llama2.ipynb — QLoRA pipeline using quantized model loading
  • README.md — Project documentation

🧠 Concepts Demonstrated

  • Parameter Efficient Fine-Tuning (PEFT)
  • LoRA adapter configuration
  • QLoRA 4-bit quantization workflow
  • HuggingFace Transformers integration
  • Memory-efficient LLM training setup

⚠️ Execution Note

Due to hardware and GPU resource limitations, full fine-tuning was not executed in this repository.

The project focuses on correctly implementing and configuring the training pipelines required for LoRA and QLoRA fine-tuning. These notebooks are ready to run on suitable GPU environments such as:

  • Google Colab (High-RAM / A100)
  • Kaggle GPU sessions
  • Local machines with ≥16GB VRAM

🧠 Technologies Used

  • Python
  • HuggingFace Transformers
  • PEFT (Parameter-Efficient Fine-Tuning)
  • PyTorch
  • BitsAndBytes (Quantization)
  • LLaMA-2

🔬 Why LoRA & QLoRA?

Method Memory Usage Trainable Parameters Purpose
Full Fine-Tuning Very High 100% Expensive adaptation
LoRA Low Small adapters Efficient tuning
QLoRA Very Low Small adapters Consumer GPU training

QLoRA enables training large models even on limited hardware through 4-bit quantization.


▶️ How to Run

  1. Install dependencies text
pip install transformers peft accelerate bitsandbytes datasets
  1. Open either notebook: text
- Lora_Llama2.ipynb
- QLoRA_Llama2.ipynb
  1. Run in a GPU-enabled environment.

🧪 Key Learnings

  • Modern LLM training increasingly relies on PEFT methods
  • Quantization dramatically reduces memory requirements
  • LoRA enables efficient adaptation without retraining entire models
  • Proper pipeline setup is critical before large-scale training

🚀 Future Improvements

  • Execute full fine-tuning on GPU hardware
  • Compare LoRA vs QLoRA training efficiency
  • Evaluate downstream task performance
  • Integrate evaluation benchmarks

About

Implements LoRA and QLoRA parameter-efficient fine-tuning pipelines for LLaMA-2, demonstrating memory-efficient LLM adaptation workflows using HuggingFace Transformers and PEFT.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors