Mastering A/B Testing: A Real World Business Example [Part 1]

Step by Step Walk-Through Real World A/B Testing Use Case

Aug 14, 2023

Imagine a scenario where you are running a website and you aim to optimize your user experience and boost conversions. In this dynamic landscape, you are facing a critical question: which version of the website will resonate better with users and result in higher engagement? This is where A/B testing comes into play.

In this article, we will unravel the intricacies of A/B testing using this real-world example, exploring how a simple division of users into two groups can provide profound insights that lead to enhanced user experiences and business success.

Join us as we journey through the process, from hypothesis formulation to data analysis, and discover how A/B testing empowers organizations to make informed decisions backed by empirical evidence, turning educated guesses into tangible results.

The best interactive roadmaps for Data Science roles. With links to free learning resources. Start here: https://aigents.co/learn/roadmaps/intro
The search engine for Data Science learning recourses. 100K handpicked articles and tutorials. With GPT-powered summaries and explanations. https://aigents.co/learn
Teach yourself Data Science with the help of an AI tutor (powered by GPT-4). https://community.aigents.co/spaces/10362739/

1. Overview of Bussines Example

In the previous article in this series, we’ve gone over an overview of A/B testing and its history. Now it is time to start diving into a practical example.

Throughout this article and this series, we’ll use an example of an online education company sort of like Udacity and coursera and we will call it To Data & Beyond. To Data & Beyond is focused specifically on AI and data science courses, and we’re trying to test features that increase student engagement.

First, let’s talk about what a typical user flow through the site might look like. You would probably see that the largest number of users visit the homepage. Then a subset of those users might explore the site by looking at a few different pages. An even smaller group might create an account. And a final group might reach some sort of completion. Maybe they make a purchase, complete a class, finish a series of classes, or share the site on their blog, for example. This type of flow is often called a customer funnel.

The idea is that you have the largest number of events at the top of your funnel, and as you go down, it becomes rarer and rarer that someone would reach that level. The idea is that users are trickling down the funnel, but that’s kind of a simplistic idea. Customers don’t actually enter, create an account, and consistently complete a class. There’s a lot of back-and-forth swirl between the different states, and repeat visitors who skip over intermediate steps.

In this article, we’re going to go over a simple experiment from start to finish, so that you can see all the steps that are necessary for running an experiment end to end. In future lessons, we’ll be going through several of the steps in more detail. We’ll consider an experimental change to the site homepage. Specifically, we’ll consider a change to the Start Now button. If users click this button, they can see a list of available courses. In this experiment, we’ll make a simple change, making the Start Now button pink instead of orange.

So a first pass hypothesis for our simple experiment is that changing the Start Now button from orange to pink will increase how many students explore the available courses, that is move on to the second step. So a first-pass hypothesis for our simple experiment is that changing the Start Now button from orange to pink will increase how many students explore the available courses, that is move on to the second step of the funnel.

2. Choosing the Metrics

Now that we know the general change we want to make to our website, we need to choose a metric to measure that change. For the current version of our hypothesis, we didn’t really talk about how to measure whether changing the color scheme was an improvement.

What we ultimately care about is how many people actually complete courses, so one possible metric is the total number of courses completed. However, given that it can take students weeks or months to finish courses, using this metric would simply take too much time to be practical.

An alternative is how many users actually click on the Start Now button. The assumption is that if more people click the button and thus move on to exploring the site, then eventually some of them will create an account and go on to complete a course. In other words, increasing the rate at which users progress down the funnel at one level will have a positive impact on the end of the funnel as well.

We could use the fraction of page visitors who click the View Courses button. That is, the number of clicks on the View Courses button is divided by the number of page views to the home page. This metric is commonly called click-through rate (CTR). There’s also a closely related metric, which many people also refer to as click-through rate, but we will call it click-through probability.

Click-through probability is defined as the number of unique visitors who click at least once, divided by the number of unique visitors who view the page. To see how these two metrics are different, suppose you have a web page and two users visit the home page. The first leaves without clicking the Start Now button, which means they clicked zero times, and the second person clicks five times. Maybe the next page loaded slowly, so the user impatiently clicked five times. In this case, the click-through rate equals 2.5, since there were five total clicks and two total page views. But the click-through probability equals 0.5 since half the users who visited the page clicked the button.

Rates and probabilities have different characteristics and in this article, we’re going to use click-through probability as our metric, and not click-through rate. Given this, our updated hypothesis is that we will increase the click-through probability of the button, and we assume that that will ultimately increase the final business metric, which is the total courses completed.

3. Estimating Click-Through Probability

Estimating the right business metrics is one of the key steps in the experiment design. For our example, we decided to use the click-through probability instead of the click-through rate. Why? And how do you decide in general?

So generally speaking, you use a rate when you want to measure the usability of the site and a probability when you want to measure the total impact. So if, for example, you want to measure the usability of a particular button, you use the rate because the users have a variety of different places on the page that they can actually choose to click on. And so the rate will say, how often do they actually find that button? Now, if you just want to know how often users went to the second level page on your site, you use a probability because you don’t want to count, did users double-click or did they reload, or all of those types of issues?

In our example, we’re interested in whether users are progressing to the second level of the funnel, which is why we picked the probability. That’s right. That makes sense. So how will we actually compute the probability? So to compute the probability, you’re going to have to first work with the engineers to modify your website. They’re going to have to change the website so that on every page view, you capture that event, and then whenever a user clicks, you also capture that click event. Now, once you have the data captured, to compute a rate, you just sum the page views, you sum the clicks and you divide and get the probability.

4. Repeating the Experiment

So let’s say we go to measure the click-through probability of the Start Now button, and we see that 1000 unique users visit the homepage and 100 of those visitors click the Start Now button at least once. From this data, the best estimate of the click-through probability would be 100 divided by 1000, or 10%.

But how sure would you be about this estimate? In other words, suppose you repeated the measurement. You had a different 1000 users visit the site. And again, you recorded how many of them clicked the Start Now button. Which results would surprise you?

Would you be surprised if you got 100 clicks again, 101, 110, 150, or 900? In part 2 of this article, we will talk about how to quantify this.

5. Choosing the Distribution

In the next article, we’ll be going over the statistics behind whether to be surprised by 150 clicks or not. We will cover binomial distribution, confidence intervals, and hypothesis testing. So, how do we know whether to be surprised by 150 clicks? In other words, how do you know how variable your estimate is likely to be?

Well, there are certain distributions that commonly arise in statistics that can help give you some good guidelines for how variable your data is likely to be.

For our data here, we’re going to be using a slightly different distribution that is binomial distribution, because instead of having continuous data, we have successes and failures.

Now, for the binomial, we’re in good shape for this example, because we can call a click a success and a no-click visit to the page a failure. It doesn’t really matter, though. You could also use this distribution if your data were red and blue or any two exclusive outcomes. What matters is we have exactly two possible outcomes, click and no-click, and that the metric we’re going to compute with this is the probability.

In part 2 of this article, we will continue working on the same example and we will cover more advanced topics regarding to know whether the results are significant or not.

Looking to start a career in data science and AI and do not know how. I offer data science mentoring sessions and long-term career mentoring:

Mentoring sessions: https://lnkd.in/dXeg3KPW
Long-term mentoring: https://lnkd.in/dtdUYBrM

To Data & Beyond