To Data & Beyond

To Data & Beyond

Share this post

To Data & Beyond
To Data & Beyond
Hands-On LangChain for LLM Applications Development: Documents Splitting [Part 1]
Copy link
Facebook
Email
Notes
More

Hands-On LangChain for LLM Applications Development: Documents Splitting [Part 1]

Youssef Hosni's avatar
Youssef Hosni
Nov 25, 2023
∙ Paid
2

Share this post

To Data & Beyond
To Data & Beyond
Hands-On LangChain for LLM Applications Development: Documents Splitting [Part 1]
Copy link
Facebook
Email
Notes
More
1
1
Share

Once you’ve loaded documents, you’ll often want to transform them to better suit your application. The simplest example is you may want to split a long document into smaller chunks that can fit into your model’s context window. 

When you want to deal with long pieces of text, it is necessary to split up that text into chunks. As simple as this sounds, there is a lot of potential complexity here. Ideally, you want to keep the semantically related pieces of text together. 

LangChain has a number of built-in document transformers that make it easy to split, combine, filter, and otherwise manipulate documents. In this two-part practical article, we will explore the importance of document splitting, and the available LangChain text splitters and will explore four of them in depth. 

Table of Contents:

  1. Why do we need document splitting?

  2. Different types of LangChain splitters

  3. Introduction to recursive character text splitter & the character text splitter

  4. Diving deep in recursive splitting 

  5. PDF loading & splitting [Covered in part 2 ]

  6. Token splitting [Covered in part 2 ]

  7. Context-aware splitting [Covered in part 2 ]

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Youssef Hosni
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More