Parser API v1

The parser API endpoint allows you to extract text content from various document formats. This is particularly useful when you need to process documents before using them in your AI applications.

Limitations

Maximum file size: 10 MB
Supported file formats:
- Text files (.txt)
- Markdown (.md)
- PDF documents (.pdf)
- CSV files (.csv)
- Excel spreadsheets (.xlsx, .xls)
- Common programming language files (.js, .py, .java, etc.)

Generate a User/Org API key

You will need to generate an API key to authenticate your requests. For more information, visit the User/Org API key documentation.

POST/v1/parser

Parse documents

Parse documents by sending them to the parser API endpoint.

Headers

Name
Content-Type
Type
string
Required
Required
Description
Request content type. Needs to be multipart/form-data.
Name
Authorization
Type
string
Required
Required
Description
Replace <YOUR_API_KEY> with your user/org API key.

Request Body

Name
document
Type
File
Required
Required
Description
The input document to be parsed. Must be one of the supported file formats and under 10 MB in size.
Name
documentName
Type
string
Required
Required
Description
The name of the document including its extension (e.g., document.pdf).
Name
contentType
Type
string
Required
Required
Description
The MIME type of the document. Supported MIME types based on file format:
- Text file: text/plain
- Markdown: text/markdown
- PDF documents: application/pdf
- CSV files: text/csv
- Excel spreadsheets:
  - application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  - application/vnd.ms-excel
- Programming language files: text/plain

Usage example

Install the SDK

npm i langbase

Environment variables

.env file

LANGBASE_API_KEY="<YOUR_API_KEY>"

Parse a document

Parse document

POST

/v1/parser

curl https://api.langbase.com/v1/parser \
-X POST \
-H 'Authorization: Bearer <YOUR_API_KEY>' \
-F 'document=@/path/to/document.pdf' \
-F 'documentName=document.pdf' \
-F 'contentType=application/pdf'

Response

Name
Response
Type
ParserResponse
Description
The response is a JSON object with the following structure:
Parser API Response
interface ParserResponse { documentName: string; content: string; }
Name
documentName
Type
string
Description
The name of the parsed document.
Name
content
Type
string
Description
The extracted text content from the document.

API Response

{
  "documentName": "document.pdf",
  "content": "Extracted text content from the document..."
}

API Reference