Crawler langbase.tools.crawl()

You can use the tools.crawl() function to extract content from web pages. This is particularly useful when you need to gather information from websites for your AI applications.

The crawling functionality is powered by the following services:


Pre-requisites

  1. Langbase API Key: Generate your API key from the User/Org API key documentation.
  2. Crawl API Key: Sign up at Spider.cloud OR Firecrawl to get your crawl API key.


API reference

langbase.tools.crawl(options)

Crawl web pages by running the langbase.tools.crawl() function.

Function Signature

langbase.tools.crawl(options);

// with types
langbase.tools.crawl(options: ToolCrawlOptions);

options

  • Name
    options
    Type
    ToolCrawlOptions
    Description

    ToolCrawlOptions Object

    interface ToolCrawlOptions {
    	url: string[];
    	apiKey: string;
    	maxPages?: number;
    	service?: 'spider' | 'firecrawl';
    }
    

    Following are the properties of the options object.


url

  • Name
    url
    Type
    string[]
    Required
    Required
    Description

    An array of URLs to crawl. Each URL should be a valid web address.


apiKey

  • Name
    apiKey
    Type
    string
    Required
    Required
    Description

    Your Spider.cloud API key – get one from Spider.cloud.


maxPages

  • Name
    maxPages
    Type
    number
    Description

    Maximum number of pages to crawl. Limits crawl depth.


service

  • Name
    service
    Type
    string
    Description

    The crawling service to use. Options are spider or firecrawl. Default is spider.

Install the SDK

npm i langbase

Environment variables

.env file

LANGBASE_API_KEY="<USER/ORG-API-KEY>"
CRAWL_KEY="<CRAWL-API-KEY>"

langbase.tools.crawl() examples

langbase.tools.crawl()

import { Langbase } from 'langbase';

const langbase = new Langbase({
	apiKey: process.env.LANGBASE_API_KEY!,
});

async function main() {
	const results = await langbase.tools.crawl({
		url: ['https://example.com'],
		apiKey: process.env.CRAWL_KEY!,
		maxPages: 5
	});

	console.log('Crawled content:', results);
}

main();

Response

  • Name
    ToolCrawlResponse[]
    Type
    Array<object>
    Description

    An array of objects containing the URL and the extracted content returned by the langbase.tools.crawl() function.

    ToolCrawlResponse Type

    interface ToolCrawlResponse {
    	url: string;
    	content: string;
    }
    
    • Name
      url
      Type
      string
      Description

      The URL of the crawled page.

    • Name
      content
      Type
      string
      Description

      The extracted content from the crawled page.

ToolCrawlResponse Example

[
	{
		"url": "https://example.com/page1",
		"content": "Extracted content from the webpage..."
	},
	{
		"url": "https://example.com/page2",
		"content": "More extracted content..."
	}
]