To Data & Beyond

To Data & Beyond

Share this post

To Data & Beyond
To Data & Beyond
Fine-Tuning Mistral 7B with Hugging Face AutoTrain to Generate Better Midjourney Prompts

Fine-Tuning Mistral 7B with Hugging Face AutoTrain to Generate Better Midjourney Prompts

Youssef Hosni's avatar
Youssef Hosni
Sep 29, 2024
∙ Paid
1

Share this post

To Data & Beyond
To Data & Beyond
Fine-Tuning Mistral 7B with Hugging Face AutoTrain to Generate Better Midjourney Prompts
2
Share

Get 60% off for 1 year

In this blog post, we explore the process of fine-tuning the Mistral 7B model using Hugging Face AutoTrain to enhance the generation of Midjourney prompts. 

We begin by setting up the necessary working environment, ensuring all tools and dependencies are ready for seamless operation. Next, we guide you through loading the dataset, a crucial step in preparing for effective fine-tuning. 

We then delve into the core of the process: fine-tuning the large language model (LLM) with AutoTrain, highlighting key techniques and configurations. 

Following this, we discuss how to use the fine-tuned model for inference, enabling practical applications of your enhanced model. Finally, we cover the efficient loading of the PEFT model by utilizing the upload model feature, ensuring smooth integration into your workflows. This comprehensive guide aims to equip you with the skills needed to optimize LLMs for specific creative tasks

Table of Contents:

  1. Setting Up Working Environment 

  2. Load the Dataset

  3. Fine-tune LLM with AutoTrain

  4. Put the Fine-Tuned LLM in Inference 


My New E-Book: Prompt Engineering Best Practices for Instruction-Tuned LLM

Youssef Hosni
·
September 16, 2024
My New E-Book: Prompt Engineering Best Practices for Instruction-Tuned LLM

I am happy to announce that I have published a new ebook Prompt Engineering Best Practices for Instruction-Tuned LLM. Prompt Engineering Best Practices for Instruction-Tuned LLM is a comprehensive guide designed to equip readers with the essential knowledge and tools to master the fine-tuning and prompt engineering of large language models (LLMs). The book covers everything from foundational concepts to advanced applications, making it an invaluable resource for anyone interested in leveraging the full potential of instruction-tuned models.

Read full story

1. Setting Up Working Environment

We will start with installing two essential Python libraries: pandas and autotrain-advanced. The panda's library is commonly used for data manipulation and analysis, making it easier to structure and preprocess datasets. 

Meanwhile, autotrain-advanced is a Hugging Face package that automates much of the fine-tuning process for large language models (LLMs), streamlining tasks like hyperparameter tuning and dataset handling. 

The -q flag ensures the installation runs quietly, keeping the output clean and focused. Together, these tools set the foundation for fine-tuning Mistral 7B efficiently.

!pip install pandas autotrain-advanced -q

Next, we will configure AutoTrain and ensure that your environment is using the latest version of PyTorch. 

!autotrain setup --update-torch

Finally, you will need to log in to HuggingFace to be able to load and train the model. You can follow the steps below to do so:

  1. Login to Hugging Face

  2. Go to get access token page

  3. Create a write token and copy it to your clipboard

  4. Run the code below and enter your token

from huggingface_hub import notebook_login
notebook_login()

Now that our working environment is ready we can start the training process by loading the data.

2. Load the Dataset

We will use a dataset that was used in the finetune-llama-2 GitHub repository. So we will start with cloning the GitHub repository, which contains useful scripts or configurations for fine-tuning models like Mistral 7B. By running !git clone, you download the entire repository onto your local environment. 

Next, the %cd finetune-llama-2 command changes the directory into the cloned repository, allowing you to interact with its files. The %mv train.csv ../train.csv command moves a dataset file (train.csv) from the repository to the parent directory, where it can be more easily accessed for training. 

Finally, %cd .. navigates back to the parent directory. These steps help you set up and organize the dataset required for fine-tuning, making sure the data is ready for use in the next stages.

!git clone https://github.com/joshbickett/finetune-llama-2.git
%cd finetune-llama-2
%mv train.csv ../train.csv
%cd ..

Now that we have the data we can read and show it using pandas.

import pandas as pd
df = pd.read_csv("train.csv")
df

To have a better idea of the data let's observe the first two examples of it. We can see that the data is used to instruct the model to write a prompt for generating images using the Midjouirney image generation tool. 

df['text'][1]

###Human:
Generate a midjourney prompt for A robot on a first date

###Assistant:
A robot, with a bouquet of USB cables, nervously adjusting its antennas, at a romantic restaurant that serves electricity shots.

df['text'][2]

###Human:
Generate a midjourney prompt for A snail at a speed contest

###Assistant:
A snail, with a mini rocket booster, confidently lining up at the start line, with a crowd of cheering insects.

Since we already have the data ready for us as we download from another fine-tuning project there will be no need to invest time in processing and we can jump directly to the fine-tuning step. 

3. Fine-tune LLM with AutoTrain 

AutoTrain Advanced (or simply AutoTrain), developed by Hugging Face, is a robust no-code platform designed to simplify the process of training state-of-the-art models across multiple domains: NLP, CV, and even Tabular Data analysis. 

This tool leverages the powerful frameworks created by various teams at Hugging Face, making advanced machine learning and artificial intelligence accessible to a broader audience without requiring deep technical expertise.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Youssef Hosni
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share