This article is written by Marina Wyss! If you liked her writing, make sure to subscribe to her newsletter!
Want to know what separates entry-level machine learning projects from the systems powering companies like Google and Amazon? The gap might seem impossibly wide, but there’s actually a clear progression most ML practitioners follow.
Today, I’m mapping out the five levels of machine learning projects that separate complete beginners from industry leaders. By the end of this post, you’ll understand exactly where you are on this journey and what specific skills you need to reach the next level.
Many aspiring ML Engineers get stuck building the wrong types of projects that never actually land them jobs. I’ll show you exactly what level of project you need for different roles — from entry-level positions to research teams at top AI companies.
Get All My Books, One Button Away With 40% Off
I have created a bundle for my books and roadmaps, so you can buy everything with just one button and for 40% less than the original price. The bundle features 8 eBooks, including:
LEVEL 1: ENTRY-LEVEL DATA ANALYSIS
Let’s start at the beginning. Level 1 is where every journey begins — working with clean, structured datasets in a Jupyter notebook on your laptop.
At this level, you’re downloading pre-cleaned datasets from sources like Kaggle. You’ll import libraries such as pandas for data manipulation, use matplotlib or seaborn — and maybe even Plotly for interactive visualizations — and experiment with scikit-learn to train basic models like linear regression or logistic regression.
A typical project might look like this:
Load a CSV file into a DataFrame.
Spend time on exploratory data analysis (EDA) with simple visualizations.
Handle missing values by dropping them or filling them with means.
Encode categorical features using one-hot encoding.
Train a model using default parameters.
Evaluate with basic metrics like accuracy.
All of this happens in notebooks where you mix code, comments, and visualizations — which is perfect for learning and getting immediate feedback.
But, as we all know, these little projects are a far cry from real-world ML applications. Your pristine Kaggle datasets rarely have the messy issues of real data, and you’re not yet thinking about data leakage, sophisticated data imputation, scalability, or literally dozens of other considerations.
When you start feeling limited by these boundaries, it’s time to move on to Level 2.
LEVEL 2: STRUCTURED ML PROJECTS
At Level 2, things get more interesting — and a little more challenging. You’re now working with messier, more realistic data and structuring your projects like a professional data scientist rather than just messy experiments in notebooks.
Your tools and workflow have evolved in the following ways:
You’re moving from a single notebook to a well-organized Python project with separate modules for data processing, feature engineering, model training, and evaluation.
You use Git for version control, and you’re creating configuration files to keep experiments reproducible.
Instead of random shuffling, you’re using proper train/validation/test splits — often with things like walk-forward validation for time-series data.
You’re tackling issues like class imbalance using techniques like SMOTE or adjusting class weights and applying modern feature engineering tools.
You might be using more interesting models like LightGBM, simple neural networks, or even AI APIs.
You’re thinking about hyperparameter tuning and maybe even experimenting with more advanced options like Bayesian Search.
And perhaps you’re making a simple pipeline with tools like Prefect.
Imagine a typical Level 2 project. This could be something like:
Building a customer churn prediction model using data from multiple sources like transaction records, support interactions, and usage logs.
Handling imbalanced classes and performing feature selection to identify the most predictive variables.
And evaluating your model using precision-recall curves, ROC curves, and business-specific metrics.
This is the stage where your work becomes structured and robust. But when your manager or client says, “Great model! When can we use this?” you quickly realize there’s a whole world of production challenges waiting for you. That’s when it’s time for Level 3.
LEVEL 3: PRODUCTION-READY ML
Level 3 is the transformation from pure data science to the world of machine learning engineering — where your models have to work in production, serve real users, and drive business outcomes.
Keep reading with a 7-day free trial
Subscribe to To Data & Beyond to keep reading this post and get 7 days of free access to the full post archives.