Finetune Falcon-7b with LoRA: A Step-by-Step Guide

Fine-Tuning LLM with Hugging Face Transfomers and Weight & Biases

Jul 21, 2024

∙ Paid

As large language models (LLMs) continue to dominate the machine learning landscape, the ability to fine-tune these models has become crucial for researchers and developers alike.

This technical blog post offers a detailed, step-by-step guide on fine-tuning the Falcon-7b model using Low-Rank Adaptation (LoRA), leveraging the robust capabilities of Hugging Face Transformers and Weight & Biases.

We’ll walk you through the entire process, starting with environment setup and progressing through data preparation, model initialization, and LoRA configuration. You’ll learn how to effectively load your dataset, initialize the Falcon-7b model and its tokenizer, and configure LoRA for efficient parameter optimization. We’ll also cover the intricacies of defining training arguments to suit your specific requirements.

The core of this guide focuses on the actual fine-tuning process, providing you with the knowledge to train your model effectively. By the end, you’ll have a solid understanding of how to fine-tune large-scale models, opening up new possibilities for your machine-learning projects.

My New E-Book: LLM Roadmap from Beginner to Advanced Level

Youssef Hosni

June 18, 2024

I am pleased to announce that I have published my new ebook LLM Roadmap from Beginner to Advanced Level. This ebook will provide all the resources you need to start your journey towards mastering LLMs. The content of the book covers the following topics:

Read full story

1. Setting up the Working Environment

We will start with notebook login to hugging face hub. This login process is necessary for several reasons:

Accessing private models or datasets on the Hugging Face Hub
Pushing your models or datasets to the Hub
Using services that require authentication, like the Inference API

from huggingface_hub import notebook_login

notebook_login()

Running this code in a notebook environment will prompt you to enter your Hugging Face access token, which you can get from your Hugging Face account.

Next, we will import these packages:

trl: Transformer Reinforcement Learning library
transformers: Hugging Face’s library for state-of-the-art Natural Language Processing
accelerate: Library for easy use of distributed training on multiple GPUs/TPUs
peft: Parameter-efficient fine-tuning methods
datasets: Hugging Face’s library for easily accessing and sharing datasets
bitsandbytes: Library for quantization and matrix multiplication
einops: Library for tensor operations in deep learning
wandb: Weights & Biases, a tool for tracking and visualizing machine learning experiments

!pip install -q -U trl transformers accelerate git+https://github.com/huggingface/peft.git
!pip install -q datasets einops wandb
!pip install bitsandbytes==0.43.1

The -q flag in the pip commands stands for "quiet" mode, which reduces the output verbosity. The -U flag means "upgrade" - it will upgrade the packages if they're already installed.

2. Loading the Dataset

The next step is to load the dataset we will use to fine-tune the model. We will use a subset of the Open Assistant dataset. This subset of the data only contains the highest-rated paths in the conversation tree, with a total of 9,846 samples. This dataset is a good option to train the model with QLoRA.

from datasets import load_dataset

dataset_name = "timdettmers/openassistant-guanaco"
dataset = load_dataset(dataset_name, split="train")

Let's explore one of the samples of the data:

dataset[0]

{‘text’: ‘### Human: Can you write a short introduction about the relevance of the term “monopsony” in economics? Please use examples related to potential monopsonies in the labour market and cite relevant research.### Assistant: “Monopsony” refers to a market structure where there is only one buyer for a particular good or service. In economics, this term is particularly relevant in the labor market, where a monopsony employer has significant power over the wages and working conditions of their employees. The presence of a monopsony can result in lower wages and reduced employment opportunities for workers, as the employer has little incentive to increase wages or provide better working conditions.\n\nRecent research has identified potential monopsonies in industries such as retail and fast food, where a few large companies control a significant portion of the market (Bivens & Mishel, 2013). In these industries, workers often face low wages, limited benefits, and reduced bargaining power, leading to a situation where they are dependent on the employer for their livelihood. This dependence can result in further suppression of wages and a decline in working conditions.\n\nOverall, the concept of monopsony is essential to understanding the dynamics of labor markets and the impact of market power on workers. Further research is needed to understand the extent and impact of monopsonies on the economy and to develop policies to address this issue.\n\nReferences:\nBivens, J., & Mishel, L. (2013). The Pay of Corporate Executives and Financial Professionals as Evidence of Rents in Top 1 Percent Incomes. Journal of Economic Perspectives, 27(3), 57–78.### Human: Now explain it to a dog’}

3. Loading the Model & Tokenizer

We will be using the falcon-7b-sharded-bf16 which is a 7-billion-parameter Falcon model language model. The “sharded-bf16” suggests it’s using brain floating point (bfloat16) precision.

We will also set up 4-bit quantization using the “nf4” (normalized float 4) quantization type, with computations done in float16 precision. This significantly reduces the memory footprint of the model. The 4-bit quantization allows the model to fit in GPU memory more easily, at the cost of some precision.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoTokenizer

model_name = "ybelkada/falcon-7b-sharded-bf16"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    trust_remote_code=True
)
model.config.use_cache = False

Next, we will load the tokenizer

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

To Data & Beyond