If you want to build a Local OCR application so that you do not need to send your documents to APIs. You can use 𝐒𝐦𝐨𝐥𝐃𝐨𝐜𝐥𝐢𝐧𝐠, which is an ultra-compact vision-language model for end-to-end multi-modal document conversion that can be used for LocalOCR.
𝗦𝗺𝗼𝗹𝗗𝗼𝗰𝗹𝗶𝗻𝗴 is a compact 256M open-source vision language model designed for OCR. It offers end-to-end document conversion without complex pipelines, allowing a single small model to handle everything.
It’s fast and efficient, processing a page in just 0.35 seconds on a consumer GPU with less than 500MB VRAM. Despite its small size, it delivers high accuracy, outperforming models 27× larger in full-page transcription, layout detection, and code recognition.
In this two-part hands-on tutorial, we’ll build a local OCR application using 𝗦𝗺𝗼𝗹𝗗𝗼𝗰𝗹𝗶𝗻𝗴. In the second part, we’ll develop the OCR web application and we’ll integrate everything from the first part and create the application interface using streamlit.
Table of contents:
Set up the Working Environment
Build the OCR Pipeline with SmolDocling
Build the Streamlit Application
Run the OCR Application
You can find the codes and data used in this article in this GitHub Repo.
My New E-Book: LLM Roadmap from Beginner to Advanced Level
I am pleased to announce that I have published my new ebook LLM Roadmap from Beginner to Advanced Level. This ebook will provide all the resources you need to start your journey towards mastering LLMs.
1. Set up the Working Environment
The first step is to set up our working environment. First, we will create a virtual environment and then activate it so that we will have a separate environment for our project.
# Create a virtual environment
python -m venv myenv
# Activate the virtual environment
# On Windows:
myenv\Scripts\activate
# On macOS/Linux:
source myenv/bin/activate
Once you have activated the virtual environment, we will create the requirements.txt file with the required packages and dependencies for our application.
streamlit
torch
transformers
pillow
docling
You can install these packages with one command
# Install packages from requirements.txt
pip install -r requirements.txt
Now we are ready to import our packages that we will be using in our project.
import streamlit as st
import torch
from PIL import Image
import io
from transformers import AutoProcessor, AutoModelForVision2Seq
from docling_core.types.doc import DoclingDocument
from docling_core.types.doc.document import DocTagsDocument
2. Build the OCR Pipeline with SmolDocling
The second step will be defining the ocr_pipeline we created in the first article. We have already explained step by step in the first article, so I will not explain again. You can refer to the first article for further explanation.