Building Local OCR Application SmolDocling: A Step-by-Step Guide [Part 2]

Youssef Hosni

Mar 26, 2025

∙ Paid

Get 50% off for 1 year

If you want to build a Local OCR application so that you do not need to send your documents to APIs. You can use 𝐒𝐦𝐨𝐥𝐃𝐨𝐜𝐥𝐢𝐧𝐠, which is an ultra-compact vision-language model for end-to-end multi-modal document conversion that can be used for LocalOCR.

𝗦𝗺𝗼𝗹𝗗𝗼𝗰𝗹𝗶𝗻𝗴 is a compact 256M open-source vision language model designed for OCR. It offers end-to-end document conversion without complex pipelines, allowing a single small model to handle everything.

It’s fast and efficient, processing a page in just 0.35 seconds on a consumer GPU with less than 500MB VRAM. Despite its small size, it delivers high accuracy, outperforming models 27× larger in full-page transcription, layout detection, and code recognition.

In this two-part hands-on tutorial, we’ll build a local OCR application using 𝗦𝗺𝗼𝗹𝗗𝗼𝗰𝗹𝗶𝗻𝗴. In the second part, we’ll develop the OCR web application and we’ll integrate everything from the first part and create the application interface using streamlit.

My New E-Book: LLM Roadmap from Beginner to Advanced Level

Youssef Hosni

June 18, 2024

I am pleased to announce that I have published my new ebook LLM Roadmap from Beginner to Advanced Level. This ebook will provide all the resources you need to start your journey towards mastering LLMs.

Read full story

1. Set up the Working Environment

The first step is to set up our working environment. First, we will create a virtual environment and then activate it so that we will have a separate environment for our project.

# Create a virtual environment
python -m venv myenv

# Activate the virtual environment
# On Windows:
myenv\Scripts\activate

# On macOS/Linux:
source myenv/bin/activate

Once you have activated the virtual environment, we will create the requirements.txt file with the required packages and dependencies for our application.

streamlit
torch
transformers
pillow
docling

You can install these packages with one command

# Install packages from requirements.txt
pip install -r requirements.txt

Now we are ready to import our packages that we will be using in our project.

import streamlit as st
import torch
from PIL import Image
import io
from transformers import AutoProcessor, AutoModelForVision2Seq
from docling_core.types.doc import DoclingDocument
from docling_core.types.doc.document import DocTagsDocument

2. Build the OCR Pipeline with SmolDocling

The second step will be defining the ocr_pipeline we created in the first article. We have already explained step by step in the first article, so I will not explain again. You can refer to the first article for further explanation.

To Data & Beyond

Paid episode

Building Local OCR Application SmolDocling: A Step-by-Step Guide [Part 2]

Table of contents:

My New E-Book: LLM Roadmap from Beginner to Advanced Level

1. Set up the Working Environment

2. Build the OCR Pipeline with SmolDocling

This post is for paid subscribers