Efficient Python for Data Scientists Course [11/14]: Make Your Pandas Code 1000 Times Faster With This Trick

Nov 19, 2025

Pandas is a popular and widely used library in Python for data manipulation and analysis. While it is powerful and flexible, its performance can sometimes become a bottleneck in large datasets. In this article, we will explore a technique to significantly improve the speed of your Pandas code, increasing its efficiency by up to 1000 times.

Whether you are a beginner or an experienced Pandas user, this article will provide you with valuable insights and practical tips for speeding up your code. So, if you want to boost the performance of your Pandas code, read on!

Get All My Books With 40% Off

Make Your Pandas Code 1000 Times Faster With This Trick

Black Friday Discount: Pay $10 for $130 Worth of Value Products

Youssef Hosni

Nov 16

Black Friday Discount: Pay $10 for $130 Worth of Value Products

Hi! Black Friday Mega discounts are here till the end of November! For only 10$ you will get:

Read full story

1. Create Dataset & Problem Statement

Get All My Books With 40% Off

First, let’s create the data we will use throughout this article and compare different methods. The data we will collect will include age, time spent in bed, percentage of sleep, favorite food, and least favorite food.

Let’s build a function to get the data given the size:

def get_data(size= 10000):
    df = pd.DataFrame()
    size = 10000
    df[’age’] = np.random.randint(0,100,size)
    df[’time_in_bed’] = np.random.randint(0,9,size)
    df[’pct_sleeping’] = np.random.randint(size)
    df[’favorite_food’] = np.random.choice([’pizza’,’ice-cream’,’burger’,’rice’], size)
    df[’hate_food’] = np.random.choice([’milk’,’vegetables’,’eggs’])
    return df

df = get_data()
df.head()

The task we will work on is a reward calculation based on the following measures:

If they were in bed for more than 5 hours and sleeping more than 50%, we will give them their favorite food.
Otherwise, we give them their hate food
If they are over 90 years old, give them their favorite food regardless

This can be represented using the following function:

Get 85% off for 1 year

def reward_cal(row):
    if row[’age’] >=90:
        return row[’favorite_food’] 
    if (row[’time_in_bed’] > 5) & (row[’pct_sleeping’]>0.5):
        return row[’favorite_food’]
    return row[’hate_food’]

2. Level 1: Loops

Get All My Books With 40% Off

The first and straightforward approach is to use for loops to iterate over each row of the data frame. The code below

%%timeit

for index, row in df.iterrows():
   df.loc[index,’reward’] = reward_cal(row)

As we can see, the computation time used to iterate through every row of the data frame is 15.8 s. Given that the data has only 10000 rows, which is considered small. So if the data has millions of rows so it will take hours to do only one task. Therefore, this is not the most efficient way to iterate through a data frame. So let’s discuss the second method, which will improve the time complexity.

3. Level 2: Apply Function

The .apply() method in pandas is used to apply a function to each element in a pandas dataframe. It can be used to apply a custom function to each element in a specific column or to apply a function along either axis (row-wise or column-wise) of the dataframe. Let’s use it to apply the reward calculation function to each row of the data frame and then calculate the computational time:

Get 85% off for 1 year

%%timeit
df[’reward’] = df.apply(reward_cal, axis = 1)

The average time to apply the function to the 10000 rows of the data frame is only 535 ms, which is 0.535 seconds. This is around 15 times faster than using the loops. However, we are still not done. We can still improve the speed and make it 1000 times faster. Let’s see how!

4. Level3: Vectorization

Get All My Books With 40% Off

Vectorization in pandas refers to the process of applying operations to entire arrays or sequences of data, as opposed to applying them to individual elements one by one. This is done for performance reasons, as vectorized operations are usually much faster than non-vectorized operations, especially in large datasets.

Let’s apply this to the data using the conditions stated above:

Get 85% off for 1 year

%%timeit

df[’reward’] = df[’hate_food’]
df.loc[((df[’pct_sleeping’]>0.5) &(df[’time_in_bed’]>5))| (df[’age’]>90), ‘reward’] = df[’favorite_food’]

We can see now a tremendous decrease in the computation time compared to the previous two methods. The computation time has at least decreased by 1000. Let’s have a look at the differences in a bar plot.

5. Measuring the Difference

Get All My Books With 40% Off

Finally, to have a better intuition of the difference between these three methods. We will plot a bar plot diagram using the code below:

Get 85% off for 1 year

results = pd.DataFrame(
    [
        [”Loop”, 15800],
        [”apply”, 535],
        [’vectorized’, 5.8]
        
    ],
    columns = [’method’, ‘run_time’]

)

results.set_index(’method’)[’run_time’].plot(kind=’bar’)

Looking at the bar plot, we can get a better intuition of the huge difference between the different computational times of the different methods used in this article.

If you like the article and would like to support me, make sure to:

👏 Clap for the story (50 claps) to help this article be featured
Follow me on Medium
📰 View more content on my Medium profile
🔔 Follow Me: LinkedIn | Medium | GitHub | Twitter

Everything I’ve Written, One Button Away, With 40% Off
Announcing My Ebooks Bundle + 50% Discount to my Followersyousefhosni.medium.com

Looking to start a career in data science and AI, but do not know how. I offer data science mentoring sessions and long-term career mentoring:

Long-term mentoring: https://lnkd.in/dtdUYBrM
Mentoring sessions: https://lnkd.in/dXeg3KPW

To Data & Beyond

Black Friday Discount: Pay $10 for $130 Worth of Value Products

Discussion about this post

Ready for more?

To Data & Beyond

Efficient Python for Data Scientists Course [11/14]: Make Your Pandas Code 1000 Times Faster With This Trick

Table of Contents:

Black Friday Discount: Pay $10 for $130 Worth of Value Products

1. Create Dataset & Problem Statement

2. Level 1: Loops

3. Level 2: Apply Function

4. Level3: Vectorization

5. Measuring the Difference

Discussion about this post

Ready for more?