To Data & Beyond

To Data & Beyond

Share this post

To Data & Beyond
To Data & Beyond
11 Docker Container Images for Generative AI & ML Projects

11 Docker Container Images for Generative AI & ML Projects

Youssef Hosni's avatar
Youssef Hosni
Apr 06, 2025
∙ Paid
26

Share this post

To Data & Beyond
To Data & Beyond
11 Docker Container Images for Generative AI & ML Projects
1
Share

Get 50% off for 1 year

Docker containers offer significant advantages for machine learning by ensuring consistent, portable, and reproducible environments across different systems.

By encapsulating all dependencies, libraries, and configurations in a container, Docker eliminates compatibility issues and the “it works on my machine” problem.

This makes it easier to move ML projects between development, cloud, or production environments without worrying about differences in setup. Additionally, Docker enables scalability and isolation, allowing machine learning workflows to be easily scaled using tools like Kubernetes, and ensuring that dependencies do not conflict between different projects.

In this article, we will explore 11 Docker container images for Generative AI and machine learning projects. These include tools for development environments, deep learning frameworks, machine learning lifecycle management, workflow orchestration, and large language models.

Table of Contents:

I. Machine Learning & Data Science

  1. Python

  1. Jupyter Notebook data science stack

II. Generative AI & Deep Learning

  1. Hugging Face Transformers

  2. NVIDIA CUDA deep learning runtime

  3. TensorFlow

  4. PyTorch

  5. Ollama

  6. Qdrant

III. Workflow Orchestration & ML Lifecycle Management

  1. Airflow

  2. MLflow

  3. Kubeflow Notebooks


My New E-Book: LLM Roadmap from Beginner to Advanced Level

Youssef Hosni
·
June 18, 2024
My New E-Book: LLM Roadmap from Beginner to Advanced Level

I am pleased to announce that I have published my new ebook LLM Roadmap from Beginner to Advanced Level. This ebook will provide all the resources you need to start your journey towards mastering LLMs.

Read full story

I. Machine Learning & Data Science

1. Python

This is as simple and practical as it gets for starting a machine learning or data processing project.

FROM python:3.8
RUN pip install --no-cache-dir numpy pandas

Here’s what each line does:

  • FROM python:3.8: This line sets the base image to Python 3.8. That means your container will come pre-installed with Python 3.8 and a minimal Linux OS underneath (usually Debian or Alpine, depending on the tag). It’s a good choice when you want to control exactly what gets installed next.

  • RUN pip install — no-cache-dir numpy pandas: This line installs two essential Python libraries:
    1. numpy: for numerical computations (arrays, matrices, etc.)
    2. pandas: for handling tabular data and time series

  • The — no-cache-dir flag is a small optimization—it prevents pip from saving the downloaded .whl files in the container, which helps keep the image size smaller.

Use it when you’re writing lightweight scripts, preprocessing data, or building simple ML pipelines. It’s fast to build, easy to extend, and perfect for tasks that don’t need heavy frameworks like TensorFlow or PyTorch.

2. Jupyter Notebook data science stack

This command spins up a ready-to-use JupyterLab environment loaded with popular data science tools — all inside a Docker container.

docker run -it --rm -p 8888:8888 jupyter/datascience-notebook

Here’s a breakdown of what’s happening:

  • docker run: Starts a new container from the image you specify.

  • -it: Stands for interactive terminal. This keeps the container attached to your terminal session, so you can interact with it if needed.

  •  — rm: Automatically deletes the container once it’s stopped. Super handy for one-off sessions where you don’t need to persist anything after you’re done.

  • -p 8888:8888: Maps port 8888 on your local machine to port 8888 inside the container—this is how you’ll access the Jupyter interface from your browser.

  • jupyter/datascience-notebook: This is the official image from the Jupyter project. It includes:
    1. JupyterLab interface
    2. Python 3
    3. Libraries like numpy, pandas, scikit-learn, matplotlib, seaborn, and even statsmodels and bokeh.

This will provide you with a full-featured data science workbench in your browser with zero setup. You don’t need to worry about installing dependencies or conflicting Python versions — it all just works inside the container.

When to use this image:

  • When you’re exploring datasets

  • Teaching or learning data science

  • Prototyping ML models

  • Collaborating on notebooks without polluting your local Python environment.

II. Generative AI & Deep Learning

Generative AI and deep learning frameworks perform best in optimized environments. These container images include all essential dependencies pre-installed, eliminating time-consuming setup processes. As large language models continue to gain popularity, purpose-built Docker containers provide efficient deployment and scaling solutions.

3. Hugging Face Transformers

Hugging Face Transformers is a popular library used for everything from working with large language models to creating image generation systems. It’s built on top of major deep learning frameworks like PyTorch and TensorFlow, so you can easily load models, fine-tune them, monitor performance, and save your progress directly to Hugging Face.

FROM huggingface/transformers-pytorch-gpu
RUN python main.py

Here’s what this simple Dockerfile does:

  • FROM huggingface/transformers-pytorch-gpu: This base image is maintained by Hugging Face and comes preinstalled with:
    1. The transformers library
    2. datasets, tokenizers, and other related tools
    3. PyTorch with GPU (CUDA) support

It’s specifically designed to support training and inference of transformer models like BERT, GPT-2, T5, and many more.

  • RUN python main.py: This line will run your Python script (main.py) as soon as the container starts. That script can load a pretrained model, fine-tune it on your data, or serve it via an API — totally up to you.

This image saves you a ton of setup time. No need to manually install PyTorch, transformers, or manage CUDA compatibility — everything is pre-configured and GPU-optimized.

When to use this image:

  • Fine-tuning LLMs with PyTorch

  • Running inference pipelines (e.g., summarization, Q&A, classification)

  • Building model-serving apps using Hugging Face pipelines

  • Prototyping Gen AI apps using Transformers

You can also use other variants if your project is based on TensorFlow instead.

FROM huggingface/transformers-tensorflow-gpu

4. NVIDIA CUDA deep learning runtime

The NVIDIA CUDA deep learning runtime is key to speeding up deep learning tasks on GPUs. By adding it directly to your Dockerfile, you can skip the hassle of setting up CUDA manually and get GPU-accelerated machine learning workflows up and running more easily.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Youssef Hosni
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share