Parser API v1
The parser
API endpoint allows you to extract text content from various document formats. This is particularly useful when you need to process documents before using them in your AI applications.
Limitations
- Maximum file size: 10 MB
- Supported file formats:
- Text files (
.txt
) - Markdown (
.md
) - PDF documents (
.pdf
) - CSV files (
.csv
) - Excel spreadsheets (
.xlsx
,.xls
) - Common programming language files (
.js
,.py
,.java
, etc.)
- Text files (
Generate a User/Org API key
You will need to generate an API key to authenticate your requests. For more information, visit the User/Org API key documentation.
Parse documents
Parse documents by sending them to the parser API endpoint.
Headers
- Name
Content-Type
- Type
- string
- Required
- Required
- Description
Request content type. Needs to be
multipart/form-data
.
- Name
Authorization
- Type
- string
- Required
- Required
- Description
Replace
<YOUR_API_KEY>
with your user/org API key.
Request Body
- Name
document
- Type
- File
- Required
- Required
- Description
The input document to be parsed. Must be one of the supported file formats and under 10 MB in size.
- Name
documentName
- Type
- string
- Required
- Required
- Description
The name of the document including its extension (e.g.,
document.pdf
).
- Name
contentType
- Type
- string
- Required
- Required
- Description
The MIME type of the document. Supported MIME types based on file format:
- Text file:
text/plain
- Markdown:
text/markdown
- PDF documents:
application/pdf
- CSV files:
text/csv
- Excel spreadsheets:
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
application/vnd.ms-excel
- Programming language files:
text/plain
- Text file:
Usage example
Install the SDK
npm i langbase
Environment variables
.env file
LANGBASE_API_KEY="<USER/ORG-API-KEY>"
Parse a document
Parse document
import { Langbase } from 'langbase';
const langbase = new Langbase({
apiKey: process.env.LANGBASE_API_KEY!,
});
async function main() {
const document = new File(['Your document content'], 'document.txt', {
type: 'text/plain'
});
const result = await langbase.parser({
document: document,
documentName: 'document.txt',
contentType: 'text/plain'
});
console.log('Parsed content:', result);
}
main();
Response
- Name
Response
- Type
- ParserResponse
- Description
The response is a JSON object with the following structure:
Parser API Response
interface ParserResponse { documentName: string; content: string; }
- Name
documentName
- Type
- string
- Description
The name of the parsed document.
- Name
content
- Type
- string
- Description
The extracted text content from the document.
API Response
{
"documentName": "document.pdf",
"content": "Extracted text content from the document..."
}