Fine-Tuning DeepSeek R1 on Reasoning Task with Unsloth [Part 2]

Hands-On Fine-Tuning DeepSeek on Medical Reasoning Dataset

Feb 03, 2025

∙ Paid

DeepSeek recently released DeepSeek-R1, the next step in its reasoning model work. It’s an upgrade from its earlier DeepSeek-R1-Lite-Preview and shows that it’s serious about competing with OpenAI’s o1.

In this two-part hands-on tutorial, we will use Unsloth to fine-tune the DeepSeek-R1-Distill-Llama-8B model on the Medical Chain-of-Thought Dataset from Hugging Face.

In the first part of this article, we introduced the DeepSeek R1 model. Then, we set up the working environment, downloaded the model and the tokenizer, and finally tested the model with zero-shot inference, observing the result without fine-tuning.

In this part, we will start with loading and processing the medical reasoning dataset that we will use to fine-tune the model. Once the data is ready we will fine-tune the model and finally, we will test the fine-tuned model and save it locally and on Hugging Face.

Introduction to DeepSeek R1 Model [Part 1]
Setting Up Working Environment [Part 1]
Loading the Model & Tokenizer with Unsloth.ai [Part 1]
Test the Model with Zero Shot Inference [Part 1]
Loading and Processing the Dataset [Part 2]
Fine — Tune the LLM [Part 2]
Model Inference After Fine-Tuning [Part 2]
Saving the model locally & Hugging Face Hub [Part 2]

My New E-Book: Efficient Python for Data Scientists

Youssef Hosni

Jan 7

I am happy to announce publishing my new E-book Efficient Python for Data Scientists. Efficient Python for Data Scientists is your practical companion to mastering the art of writing clean, optimized, and high-performing Python code for data science. In this book, you'll explore actionable insights and strategies to transform your Python workflows, streamline data analysis, and maximize the potential of libraries like Pandas.

Read full story

5. Loading and Processing the Dataset

When fine-tuning a language model for reasoning, structuring the training dataset into a reasoning response is an important step before fine-tuning the model.

This section covers how we define a structured prompt format, preprocess dataset entries, and apply transformations before feeding data into the model.

To ensure the model generates structured medical responses, we define a prompt template that includes an instruction, a medical question, and a structured reasoning process. The template follows this format:

train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 

### Question:
{}

### Response:
<think>
{}
</think>
{}"""

This template structures the model’s response by incorporating a chain of thought (CoT) inside <think></think> tags. The {} placeholders are dynamically replaced with a medical question, reasoning process, and final response.

Once the prompt format is defined, we create a function to transform dataset entries into structured prompts. The function extracts the question, reasoning (CoT), and response from the dataset and formats them using the template:

EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN

def formatting_prompts_func(examples):
    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }

The function loops through each entry, formats it using train_prompt_style, and appends an end-of-sequence token (EOS_TOKEN). This ensures the model correctly learns when a response ends during training.

Now that we have the formatting function, we load a subset of the dataset and apply the transformation. In the code below we will:

Load the FreedomIntelligence/medical-o1-reasoning-SFT dataset, selecting 500 training samples.
Apply the formatting_prompts_func transformation to structure each sample.
Print the first formatted entry to verify the changes.

This process ensures the dataset is properly structured for fine-tuning, enabling the model to learn from well-defined medical reasoning patterns.

from datasets import load_dataset
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True)
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset["text"][0]

Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.
### Question:
A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?
### Response:
<think>
Okay, let’s think about this step by step. There’s a 61-year-old woman here who’s been dealing with involuntary urine leakages whenever she’s doing something that ups her abdominal pressure like coughing or sneezing. This sounds a lot like stress urinary incontinence to me. Now, it’s interesting that she doesn’t have any issues at night; she isn’t experiencing leakage while sleeping. This likely means her bladder’s ability to hold urine is fine when she isn’t under physical stress. Hmm, that’s a clue that we’re dealing with something related to pressure rather than a bladder muscle problem.
The fact that she underwent a Q-tip test is intriguing too. This test is usually done to assess urethral mobility. In stress incontinence, a Q-tip might move significantly, showing urethral hypermobility. This kind of movement often means there’s a weakness in the support structures that should help keep the urethra closed during increases in abdominal pressure. So, that’s aligning well with stress incontinence.
Now, let’s think about what would happen during cystometry. Since stress incontinence isn’t usually about sudden bladder contractions, I wouldn’t expect to see involuntary detrusor contractions during this test. Her bladder isn’t spasming or anything; it’s more about the support structure failing under stress. Plus, she likely empties her bladder completely because stress incontinence doesn’t typically involve incomplete emptying. So, her residual volume should be pretty normal.
All in all, it seems like if they do a cystometry on her, it will likely show a normal residual volume and no involuntary contractions. Yup, I think that makes sense given her symptoms and the typical presentations of stress urinary incontinence.
</think>
Cystometry in this case of stress urinary incontinence would most likely reveal a normal post-void residual volume, as stress incontinence typically does not involve issues with bladder emptying. Additionally, since stress urinary incontinence is primarily related to physical exertion and not an overactive bladder, you would not expect to see any involuntary detrusor contractions during the test.<｜end▁of▁sentence｜>

Keep reading with a 7-day free trial

Subscribe to To Data & Beyond to keep reading this post and get 7 days of free access to the full post archives.

To Data & Beyond

Fine-Tuning DeepSeek R1 on Reasoning Task with Unsloth [Part 2]

Hands-On Fine-Tuning DeepSeek on Medical Reasoning Dataset

Table of Contents:

My New E-Book: Efficient Python for Data Scientists

5. Loading and Processing the Dataset

Keep reading with a 7-day free trial