Parser
Parser, an AI Primitive by Langbase, allows you to extract text content from various document formats. This is particularly useful when you need to process documents before using them in your AI applications.
Parser can handle a variety of formats, including PDFs, CSVs, and more. By converting these documents into plain text, you can easily analyze, search, or manipulate the content as needed.
Quickstart: Extracting Text from Documents
Let's get started
In this guide, we'll use the Langbase SDK to interact with the Parser API:
Step #1Generate Langbase API key
Every request you send to Langbase needs an API key. This guide assumes you already have one. If not, please check the instructions below.
Step #2Setup your project
Create a new directory for your project and navigate to it.
Project setup
mkdir document-parser && cd document-parser
Initialize the project
Create a new Node.js project.
Initialize project
npm init -y
Install dependencies
You will use the Langbase SDK to work with Parser and dotenv
to manage environment variables.
Install dependencies
npm i langbase dotenv
Create an env file
Create a .env
file in the root of your project and add your Langbase API key:
.env
LANGBASE_API_KEY=your_api_key_here
Step #3Parse a PDF document
Now let's create a file named parse-pdf.ts
to demonstrate how to parse the document. You can download a sample PDF document from the below.
Move the downloaded PDF document to your project directory.
parse-pdf.ts
import 'dotenv/config';
import { Langbase } from 'langbase';
import { readFile } from 'fs/promises';
const langbase = new Langbase({
apiKey: process.env.LANGBASE_API_KEY!,
});
async function main() {
try {
// Read the PDF document
const buffer = await readFile('composable-ai.pdf');
// Parse the PDF document
const result = await langbase.parser({
document: buffer,
documentName: 'composable-ai.pdf',
contentType: 'application/pdf',
});
console.log('Parsed document name:', result.documentName);
console.log('Parse document content:', result.content);
} catch (error) {
console.error('Error parsing PDF:', error);
}
}
main();
Run the script to parse your document:
Run the script
npx tsx parse-pdf.ts
You should see output similar to this:
Parsed document name: composable-ai.pdf
Parse document content: Composable AI
In software engineering, composition is a powerful concept. It allows for building
complex systems from simple, interchangeable parts. Think Legos, Docker
containers, React components. Langbase extends this concept to AI infrastructure
with our
Composable AI
stack using Pipes and Memory.
Composable and personalized AI
: With Langbase, you can compose multiple
models together into pipelines. It's easier to think about, easier to develop for, and
each pipe lets you choose which model to use for each task. You can see cost of
every step. And allow your customers to hyper-personalize.
Effortlessly zero-config AI infra
: Maybe you want to use a smaller, domain-specific
model for one task, and a larger general-purpose model for another task. Langbase
makes it easy to use the right primitives and tools for each part of the job and
provides developers with a zero-config composable AI infrastructure.
That's a nice way of saying,
you get a unicorn-scale API in minutes, not months
.
The most common problem
I hear about in Gen AI space is that my AI agents
are too complex and I can't scale them, too much AI talking to AI. I don't have
control, I don't understand the cost, and the impact of this change vs that. Time
from new model to prod is too long. Feels static, my customers can't personalize
The Developer Friendly Future of AI Infrastructure
Why Composable AI?
Chai.new
Launching soon in limited beta
Join the waitlist
Langbase
it.
Langbase fixes all this. AA
I have built an AI email agent that can read my emails, understand the sentiment,
summarize, and respond to them. Let's break it down to how it works, hint several
pipes working together to make smart personalized decisions.
1.
I created a pipe:
email-sentiment
this one reads my emails to understand the
sentiment
2.
email-summarizer
pipe it summarizes my emails so I can quickly understand
Example: Composable AI Email Agent
Langbase Email Agent reference architecture
them
3.
email-decision-maker
pipe should I respond? is it urgent? is it a newsletter?
4.
If
email-decision-maker
pipe says
yes
, then I need to respond. This invokes the
final pipe
5.
email-writer
pipe writes a draft response to my emails with one of the eight
formats I have
Ah, the power of composition. I can swap out any of these pipes with a new one.
Flexibility
: Swap components without rewriting everything
Reusability
: Build complex systems from simple, tested parts
Scalability
: Optimize at the component level for better performance
Observability
: Monitor and debug each step of your AI pipeline
Control flow
Maybe I want to use a different sentiment analysis model
Or maybe I want to use a different summarizer when I'm on vacation
I can chose a different LLM (small or large) based on the task
BTW I definitely use a different
decision-maker
pipe on a busy day.
Extensibility
Add more when needed
: I can also add more pipes to this pipeline. Maybe I
want to add a pipe that checks my calendar or the weather before I respond to an
email. You get the idea. Always bet on composition.
Eight Formats to write emails
: And I have several formats. Because Pipes are
composable, I have eight different versions of
email-writer
pipe. I have a pipe
email-pick-writer
that picks the correct pipe to draft a response with. Why? I talk
to my friends differently than my investors, reports, managers, vendors you
name it.
Why Composable AI is powerful?
Long-term memory and context awareness
By the way, I have all my emails in an
emails-store
memory, which any of these
pipes can refer to if needed. That's managed semantic RAG over all the emails I
have ever received.
And yes, my
emails-smart-spam
memory knows all the pesky smart spam emails
that I don't want to see in my inbox.
Cost & Observability
Because each intent and action is mapped out Pipe which is an excellent
primitive for using LLMs, I can see everything related to cost, usage, and
effectiveness of each pipe. I can see how many emails were processed, how
many were responded to, how many were marked as spam, etc.
I can switch LLMs for any of these actions, fork a pipe, and see how it performs. I
can version my pipes and see how the new version performs against the old one.
And we're just getting started
Why Developers Love It
Modular
: Build, test, and deploy pipes x memorysets independently
Extensible
: API-first no dependency on a single language
Version Control Friendly
: Track changes at the pipe level
Cost-Effective
: Optimize resource usage for each AI task
Stakeholder Friendly
: Collaborate with your team on each pipe and memory. All
your R&D team, engineering, product, GTM (marketing, sales), and even
stakeholders can collaborate on the same pipe. It's like a Google Doc x GitHub
for AI. That's what makes it so powerful.
Each pipe and memory are like a docker container. You can have any number of
pipes and memorysets.
Can't wait to share more exciting examples of composable AI. We're cookin!!
We'll share more on this soon. Follow us on Twitter and LinkedIn for updates.
Previous
Introduction
Next
API Reference
Langbase, Inc. © Copyright 2025. All rights reserved.
Next Steps
- Try parsing different file formats using the Langbase Parse API
- Integrate the parsed content with other Langbase features
- Build something cool with Langbase SDK and APIs.
- Join our Discord community for feedback, requests, and support