To Data & Beyond

To Data & Beyond

Share this post

To Data & Beyond
To Data & Beyond
Hands-On LangChain for LLM Applications Development: Documents Splitting [Part 2]

Hands-On LangChain for LLM Applications Development: Documents Splitting [Part 2]

Youssef Hosni's avatar
Youssef Hosni
Dec 01, 2023
∙ Paid
3

Share this post

To Data & Beyond
To Data & Beyond
Hands-On LangChain for LLM Applications Development: Documents Splitting [Part 2]
1
1
Share

Once you’ve loaded documents, you’ll often want to transform them to better suit your application. The simplest example is you may want to split a long document into smaller chunks that can fit into your model’s context window.

When you want to deal with long pieces of text, it is necessary to split up that text into chunks. As simple as this sounds, there is a lot of potential complexity here. Ideally, you want to keep the semantically related pieces of text together.

LangChain has a number of built-in document transformers that make it easy to split, combine, filter, and otherwise manipulate documents. In this two-part practical article, we will explore the importance of document splitting, and the available LangChain text splitters and will explore four of them in-depth.

Table of Contents:

  1. Why do we need document splitting? [Covered in Part 1]

  2. Different types of LangChain splitters [Covered in Part 1]

  3. Introduction to recursive character text splitter & the character text splitter [Covered in Part 1]

  4. Diving deep in recursive splitting [Covered in Part 1]

  5. PDF loading & splitting 

  6. Token splitting 

  7. Context-aware splitting 

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Youssef Hosni
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share