Chunker

Chunker, an AI Primitive by Langbase, allows you to split text into smaller, manageable pieces. This is especially useful for RAG pipelines or when you need to work with only specific sections of a document.

Especially, useful when building AI agents RAG. Chunking text can help improve the performance of your AI applications by allowing you to focus on relevant sections of text, making it easier to analyze and process information. This is especially beneficial for large documents where you may not need to use the entire content.


Quickstart: Splitting Text into Chunks


Let's get started

In this guide, we'll use the Langbase SDK to interact with the Chunk API:


Step #1Generate Langbase API key

Every request you send to Langbase needs an API key. This guide assumes you already have one. If not, please check the instructions below.


Step #2Setup your project

Create a new directory for your project and navigate to it.

Project setup

mkdir document-chunker && cd document-chunker

Initialize the project

Create a new Node.js project.

Initialize project

npm init -y

Install dependencies

You will use the Langbase SDK to work with Chunk and dotenv to manage environment variables.

Install dependencies

npm i langbase dotenv

Create an env file

Create a .env file in the root of your project and add your Langbase API key:

.env

LANGBASE_API_KEY=your_api_key_here

Step #3Chunk the text

Now let's create a simple script to demonstrate how to split the text content into chunks:

chunk-text.ts

import 'dotenv/config';
import { Langbase } from 'langbase';

// Initialize the Langbase client
const langbase = new Langbase({
		apiKey: process.env.LANGBASE_API_KEY!,
});

async function main(content) {

	// Chunk the document
	const chunks = await langbase.chunker({
		content,
		chunkMaxLength: 1024,
		chunkOverlap: 256
	});

    console.log(`Total chunks created: ${chunks.length}`);
	console.log('Chunks:', chunks);
}

// Document content to chunk
const content = `Composable AI
The Developer Friendly Future of AI Infrastructure
In software engineering, composition is a powerful concept. It allows for building complex systems from simple, interchangeable parts. Think Legos, Docker containers, React components. Langbase extends this concept to AI infrastructure with our Composable AI stack using Pipes and Memory.

Why Composable AI?
Composable and personalized AI: With Langbase, you can compose multiple models together into pipelines. It's easier to think about, easier to develop for, and each pipe lets you choose which model to use for each task. You can see cost of every step. And allow your customers to hyper-personalize.

Effortlessly zero-config AI infra: Maybe you want to use a smaller, domain-specific model for one task, and a larger general-purpose model for another task. Langbase makes it easy to use the right primitives and tools for each part of the job and provides developers with a zero-config composable AI infrastructure.

That's a nice way of saying, you get a unicorn-scale API in minutes, not months.

The most common problem I hear about in Gen AI space is that my AI agents are too complex and I can't scale them, too much AI talking to AI. I don't have control, I don't understand the cost, and the impact of this change vs that. Time from new model to prod is too long. Feels static, my customers can't personalize it. ⌘ Langbase fixes all this. — AA

Interactive Example: Composable AI Email Agent
But how does Composable AI work?

Here's an interactive example of a composable AI Email Agent: Classifies, summarizes, responds. Click to send a spam or valid email and check how composable it is: Swap any pipes, any LLM, hyper-personalize (you or your users), observe costs. Everything is composable.

I'm stuck and frustrated because the billing API isn't working and the API documentation is outdated.
Congratulations! You have been selected as the winner of a $100 million lottery!
Send a demo request to understand composable AIarrow
Example: Composable AI Email Agent
⌘ Langbase Email Agent
⌘ Langbase Email Agent reference architecture
I have built an AI email agent that can read my emails, understand the sentiment, summarize, and respond to them. Let's break it down to how it works, hint several pipes working together to make smart personalized decisions.

I created a pipe: email-sentiment — this one reads my emails to understand the sentiment
email-summarizer pipe — it summarizes my emails so I can quickly understand them
email-decision-maker pipe — should I respond? is it urgent? is it a newsletter?
If email-decision-maker pipe says yes, then I need to respond. This invokes the final pipe
email-writer pipe — writes a draft response to my emails with one of the eight formats I have
Why Composable AI is powerful?
Ah, the power of composition. I can swap out any of these pipes with a new one.

Flexibility: Swap components without rewriting everything
Reusability: Build complex systems from simple, tested parts
Scalability: Optimize at the component level for better performance
Observability: Monitor and debug each step of your AI pipeline
Control flow
Maybe I want to use a different sentiment analysis model
Or maybe I want to use a different summarizer when I'm on vacation
I can chose a different LLM (small or large) based on the task
BTW I definitely use a different decision-maker pipe on a busy day.
Extensibility
Add more when needed: I can also add more pipes to this pipeline. Maybe I want to add a pipe that checks my calendar or the weather before I respond to an email. You get the idea. Always bet on composition.
Eight Formats to write emails: And I have several formats. Because Pipes are composable, I have eight different versions of email-writer pipe. I have a pipe email-pick-writer that picks the correct pipe to draft a response with. Why? I talk to my friends differently than my investors, reports, managers, vendors — you name it.
Long-term memory and context awareness
By the way, I have all my emails in an emails-store memory, which any of these pipes can refer to if needed. That's managed semantic RAG over all the emails I have ever received.
And yes, my emails-smart-spam memory knows all the pesky smart spam emails that I don't want to see in my inbox.
Cost & Observability
Because each intent and action is mapped out Pipe — which is an excellent primitive for using LLMs, I can see everything related to cost, usage, and effectiveness of each pipe. I can see how many emails were processed, how many were responded to, how many were marked as spam, etc.
I can switch LLMs for any of these actions, fork a pipe, and see how it performs. I can version my pipes and see how the new version performs against the old one.
And we're just getting started …
Why Developers Love It
Modular: Build, test, and deploy pipes x memorysets independently
Extensible: API-first no dependency on a single language
Version Control Friendly: Track changes at the pipe level
Cost-Effective: Optimize resource usage for each AI task
Stakeholder Friendly: Collaborate with your team on each pipe and memory. All your R&D team, engineering, product, GTM (marketing, sales), and even stakeholders can collaborate on the same pipe. It's like a Google Doc x GitHub for AI. That's what makes it so powerful.
Each pipe and memory are like a docker container. You can have any number of pipes and memorysets.

Can't wait to share more exciting examples of composable AI. We're cookin!!

We'll share more on this soon. Follow us on Twitter and LinkedIn for updates.

`;

// Chunk the content.
main(content);

Run the script to chunk your document:

Run the script

npx tsx chunk-text.ts

You should get the output similar to this: 9 chunks created: from the document.

Total chunks created: 9
Chunks: [
  'Composable AI\n' +
    'The Developer Friendly Future of AI Infrastructure\n' +
    'In software engineering, composition is a powerful concept. It allows for building complex systems from simple, interchangeable parts. Think Legos, Docker containers, React compon
	...
	...
	...
]

Next Steps

  • Use Langbase SDK to integrate chunker primitive into your AI agents and apps
  • Check out the Chunk API reference for more details on the parameters and options available
  • Experiment with different chunk sizes and overlaps to find the optimal settings for your use case
  • Integrate document chunking into your RAG (Retrieval-Augmented Generation) pipeline
  • Combine with other Langbase primitives like Parse to process various document formats
  • Join our Discord community for feedback, requests, and support