Parser langbase.parser()

You can use the parser() function to extract text content from various document formats. This is particularly useful when you need to process documents before using them in your AI applications.


Generate a User/Org API key

You will need to generate an API key to authenticate your requests. For more information, visit the User/Org API key documentation.


Limitations

  • Maximum file size: 10 MB
  • Supported file formats:
    • Text files (.txt)
    • Markdown (.md)
    • PDF documents (.pdf)
    • CSV files (.csv)
    • Excel spreadsheets (.xlsx, .xls)
    • Common programming language files (.js, .py, .java, etc.)

API reference

langbase.parser(options)

Parse documents by running the langbase.parser() function.

Function Signature

langbase.parser(options);

// with types
langbase.parser(options: ParserOptions);

options

  • Name
    options
    Type
    ParserOptions
    Description

    ParserOptions Object

    interface ParserOptions {
    	document: Buffer | File | FormData | ReadableStream;
    	documentName: string;
    	contentType: ContentType;
    }
    

    Following are the properties of the options object.


document

  • Name
    document
    Type
    Buffer | File | FormData | ReadableStream
    Required
    Required
    Description

    The input document to be parsed. Must be one of the supported file formats and under 10 MB in size.


documentName

  • Name
    documentName
    Type
    string
    Required
    Required
    Description

    The name of the document including its extension (e.g., document.pdf).


contentType

  • Name
    contentType
    Type
    ContentType
    Required
    Required
    Description

    The MIME type of the document. Supported MIME types based on file format:

    • Text files (.txt): text/plain
    • Markdown (.md): text/markdown
    • PDF documents (.pdf): application/pdf
    • CSV files (.csv): text/csv
    • Excel spreadsheets:
      • .xlsx: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
      • .xls: application/vnd.ms-excel
    • Programming language files (all use text/plain):
      • .js: text/plain
      • .py: text/plain
      • .java: text/plain
      • .cpp: text/plain
      • .cs: text/plain
      • Other code files: text/plain

Install the SDK

npm i langbase

Environment variables

.env file

LANGBASE_API_KEY="<USER/ORG-API-KEY>"

langbase.parser() examples

langbase.parser()

import { Langbase } from 'langbase';

const langbase = new Langbase({
	apiKey: process.env.LANGBASE_API_KEY!,
});

async function main() {
	const document = new File(['Your document content'], 'document.txt', {
		type: 'text/plain'
	});

	const result = await langbase.parser({
		document: document,
		documentName: 'document.txt',
		contentType: 'text/plain'
	});

	console.log('Parsed content:', result);
}

main();

Response

Response of langbase.parser() is a Promise<ParserResponse>.

ParserResponse Type

interface ParserResponse {
	documentName: string;
	content: string;
}
  • Name
    documentName
    Type
    string
    Description

    The name of the parsed document.

  • Name
    content
    Type
    string
    Description

    The extracted text content from the document.

ParserResponse Example

{
	"documentName": "document.pdf",
	"content": "Extracted text content from the document..."
}