Mastering A/B testing for Data Science Interviews: Introduction to A/B Testing
A/B testing is a powerful and widely used methodology in the field of data science, particularly in the realm of product development and user experience optimization. It offers a structured approach to test and compare two different versions of a product or feature to determine which one yields better results.
In this series of articles, we will cover everything you need to know to be able to master A/B testing for data science and data analysis interviews. We will start with an introduction to A/B testing in which we will delve into the fundamentals of A/B testing, its historical roots, practical case studies, its limitations, and alternative techniques to complement its findings.
Table of Contents:
What is A/B Testing?
History of A/B Testing
Should You Use A/B Testing for Every Change?
A/B Testing Practical Case Studies
What Are Things You Can’t Do With A/B Testing?
Other Techniques
Looking to start a career in data science and AI and need to learn how. I offer data science mentoring sessions and long-term career mentoring:
Mentoring sessions: https://lnkd.in/dXeg3KPW
Long-term mentoring: https://lnkd.in/dtdUYBrM
All the resources and tools you need to teach yourself Data Science for free!
The best interactive roadmaps for Data Science roles. With links to free learning resources. Start here: https://aigents.co/learn/roadmaps/intro
The search engine for Data Science learning recourses. 100K handpicked articles and tutorials. With GPT-powered summaries and explanations. https://aigents.co/learn
Teach yourself Data Science with the help of an AI tutor (powered by GPT-4). https://community.aigents.co/spaces/10362739/
1. What is A/B Testing?
A/B testing is a general methodology used online when you want to test a new product or feature. You’re going to take two sets of users, and you’re going to show one set, a control set, your existing product or feature, and then another set in your experiment, the new version. Then you will see how these users respond differently to determine which version of your feature is better.
2. History of A/B Testing
A/B testing has been around for a long time. Maybe we didn’t call it A/B testing, but it typically came from fields such as agriculture, where people would actually divide up their land into sections and then test what worked better for a particular crop or how it grew.
There have been a lot of different fields that use what’s effectively A/B testing for a long time. In the sciences, more generally, hypothesis testing is a key way that they determine innovation. In medicine, for example, their version of A/B testing is called clinical trials, and that’s how they determine whether a new drug is effective or not.
The key thing that you want to see in A/B testing is that you have a consistent response from your control and your experiment group so that you can determine and structure the experiments you can determine whether there’s a significant behavior change in your experiment group as opposed to in your control group.
However, there are differences between running these types of experiments and running online A/B tests. In the online world, we often have a lot more data but a kind of a lower resolution. So in a traditional medical trial or a user experience research study, you might have 10, 20, or 50 participants. So not only is the analysis different because you have to be very careful, but you know a lot about each participant in your trial. So you may know their age, their weight, you know it’s a single person, you have their driver’s license, all this stuff. You may have met them. Whereas in an online study, you may have millions of users, hundreds of thousands of clicks, and respondents, but you don’t know that much about who’s on the other end of that data. So it may be something where you have trouble distinguishing whether this is a single person, whether it’s multiple people, whether is it an internet cafe computer, and those are issues that come up in online user data.
The key thing to remember is that in the online world when you’re doing A/B testing, the goal is to determine whether or not this new product or this new feature is something that users will like. And so the goal in A/B testing is to design an experiment that’s going to be robust and give you repeatable results so that you can make a good decision about whether or not to launch that product or the future.
3. Should You Use A/B Testing for Every Change?
John Lilly, the VC at Greylock now and previously the CEO of Mozilla, came up with a great analogy. A/B testing is really useful for helping you climb to the peak of your current mountain. But if you want to figure out whether you want to be on this mountain or another mountain, A/B testing isn’t so useful.
Now that being said, you can test a pretty wide variety of things with A/B testing, everything from some new features, additions to your UI, different look for your website, and a lot of companies use it.
4. A/B Testing Practical Case Studies
When Amazon first started doing personalized recommendations, they wanted to see whether people bought more stuff. They discovered that they had a significant increase in revenue from the personalized recommendations. So you can use it for fairly complicated changes.
Google maybe took it a little too far sometimes. They ran 41 different shades of blue in the UI to see how users responded and reacted to it. And on the one hand, it was interesting and useful internally, but it might have been going a little too far with those experiments.
LinkedIn tested a change where they were trying to figure out whether they should show a news article or an encouragement to add new contacts. That’s a ranking change. Google also does a lot of ranking between the search lists and the ads.
You can also test changes that you’re not even sure a user would notice. So for example, a hundred milliseconds is not a lot of page load time, but both Amazon and Google have run experiments. Amazon showed in 2007 that for every 100 milliseconds they added to the page, they had a 1 % decrease in revenue. And for Google, they have similar results that you find even though a hundred milliseconds doesn’t seem like that much. On average, the number of queries people do decreases for every 100 milliseconds of latency that you add.
5. What Are Things You Can’t Do With A/B Testing?
A/B testing isn’t as useful for testing out new experiences. When you’re testing out a new experience and you’ve got an existing set of users and they’re like, hey, you just changed my experience and I liked my old way. That’s called change aversion. The other thing is that they can be like, oh, well this is new and they test out everything. That’s called a novelty effect, right? And so what happens in a new experience is there’s sort of like two issues.
The question is how much time you need to have your users adapt to the new experience. So you can say, what is going to be the plateaued experience so that I can make a robust decision? Yeah, and time can be a problem for other reasons.
So for example, if you have a website that recommends apartment rentals. People don’t look for apartments that often. And what you want is return business, or maybe you want to grow your business by referrals to other people who like your service, and the reality is, in the scope of an experiment, it’s going to be hard to measure whether people actually come back to you for more referrals, and even if they refer their friends, well, honestly, is it going to be next week? Is it going to be six months from now? So there are certain things that you may want to know about your site that are pretty difficult to get at through short-term A/B testing.
One last example where A/B testing isn’t useful is that A/B testing can’t tell you if you’re missing something. So let’s say you’re on a digital camera review site. A/B testing can tell you whether or not you should be showing this camera review above this camera review. But A/B testing can’t tell you if you’re missing this entire other camera that you should be reviewing and you’re not doing it at all.
6. Other Techniques
We have just seen some cases where A/B testing isn’t very helpful in the previous subsection. What should we do in those cases? Are there any other techniques you can use?
There are a lot of different ways to gather data about users, and some of them can be complementary to running an A/B test eventually as well. So for example, often you’ll have logs of what users did on your website. You can analyze them retrospectively or observationally to see if a hypothesis can be developed about what’s causing changes in their behavior. And that’s something where you may want to go forward and design and randomize an experiment and do a perspective analysis. And you can use the two data sources to kind of complement each other, where you may develop a hypothesis from looking at the logs, but you want to test out whether you can make this happen in an experiment. And so then you’ll want to run an A/B test as well to compare the results and see if your theory is valid.
There’s a whole host of other techniques as well, ranging from user experience research to focus groups and surveys and human evaluation. A/B testing can give you a lot of broad quantitative data, but these other techniques give you very deep and qualitative data that are complementary to A/B testing. So for example, these other techniques can do a better job of telling you which mountain you should be on.
In summary, we have discovered what is A/B testing and its history and the practical case studies and what it can be used in, and what we can not use. In the coming article, we will cover a real case study and see how to choose the metrics and design an experiment.
Looking to start a career in data science and AI and do not know how. I offer data science mentoring sessions and long-term career mentoring:
Mentoring sessions: https://lnkd.in/dXeg3KPW
Long-term mentoring: https://lnkd.in/dtdUYBrM