To Data & Beyond

To Data & Beyond

Share this post

To Data & Beyond
To Data & Beyond
Building Text Translation System using Meta NLLB Open-Source Model

Building Text Translation System using Meta NLLB Open-Source Model

Youssef Hosni's avatar
Youssef Hosni
May 04, 2024
∙ Paid
3

Share this post

To Data & Beyond
To Data & Beyond
Building Text Translation System using Meta NLLB Open-Source Model
1
Share

To Data & Beyond is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Hugging Face is a platform that hosts a treasure trove of open-source models, making it a goldmine for anyone diving into the world of natural language processing. 

In this guide, we will explore how to use Metas NLLB Open-Source Model through the HuggingFace Transformers package for machine translation tasks and will try it on different Arabic accents to see how it performs. 

Table of Contents:

  1. Setting Up Working Environment 

  2. Build a Translator Pipeline using HuggingFace Transformers

  3. Translating from English to Arabic with Different Accents


My E-book: Data Science Portfolio for Success Is Out!

Youssef Hosni
·
September 15, 2023
My E-book: Data Science Portfolio for Success Is Out!

I recently published my first e-book Data Science Portfolio for Success which is a practical guide on how to build your data science portfolio. The book covers the following topics: The Importance of Having a Portfolio as a Data Scientist How to Build a Data Science Portfolio That Will Land You a Job?

Read full story

1. Setting Up Working Environment

In this article, we will use the Transformers library, particularly the Pipeline function. You can install it first if you have not yet before using the command below:

    !pip install transformers 
    !pip install torch

Next, I will import the pipeline function from the Transformers library and also the torch.

from transformers import pipeline 
import torch

Now we have everything we need to create your machine translation system.

2. Build a Translator Pipeline using HuggingFace Transformers

The second step will be building a translator pipeline using huggingface transformers. We will be using Meta's open-source machine translation model No Language Left Behind model. 

No Language Left Behind-200 (NLLB-200) is a machine translation model primarily intended for research in machine translation, — especially for low-resource languages. It allows for single-sentence translation among 200 languages. NLLB-200 is a research model and has not been released for production deployment. NLLB-200 is trained on general domain text data and is not intended to be used with domain-specific texts, such as medical domain or legal domain. 

The model is not intended to be used for document translation. The model was trained with input lengths not exceeding 512 tokens, therefore translating longer sequences might result in quality degradation. NLLB-200 translations can not be used as certified translations.

Let's define the translator pipeline we will use the NLLB-200’s distilled 600M variant version of the model: 

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Youssef Hosni
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share