Chai.new is liveSign up now

Crawler langbase.tools.crawl()

You can use the tools.crawl() function to extract content from web pages. This is particularly useful when you need to gather information from websites for your AI applications. The crawling functionality is powered by Spider.cloud, and you'll need to obtain an API key from them to use this feature.


Pre-requisites

  1. Langbase API Key: Generate your API key from the User/Org API key documentation.
  2. Spider.cloud API Key: Sign up at Spider.cloud to get your crawler API key.


API reference

langbase.tools.crawl(options)

Crawl web pages by running the langbase.tools.crawl() function.

Function Signature

langbase.tools.crawl(options);

// with types
langbase.tools.crawl(options: ToolCrawlOptions);

options

  • Name
    options
    Type
    ToolCrawlOptions
    Description

    ToolCrawlOptions Object

    interface ToolCrawlOptions {
    	url: string[];
    	apiKey: string;
    	maxPages?: number;
    }
    

    Following are the properties of the options object.


url

  • Name
    url
    Type
    string[]
    Required
    Required
    Description

    An array of URLs to crawl. Each URL should be a valid web address.


apiKey

  • Name
    apiKey
    Type
    string
    Required
    Required
    Description

    Your Spider.cloud API key – get one from Spider.cloud.


maxPages

  • Name
    maxPages
    Type
    number
    Description

    The maximum number of pages to crawl. This limits the depth of the crawl operation.

Install the SDK

npm i langbase

Environment variables

.env file

LANGBASE_API_KEY="<USER/ORG-API-KEY>"
CRAWL_KEY="<SPIDER-CLOUD-API-KEY>"

langbase.tools.crawl() examples

langbase.tools.crawl()

import { Langbase } from 'langbase';

const langbase = new Langbase({
	apiKey: process.env.LANGBASE_API_KEY!,
});

async function main() {
	const results = await langbase.tools.crawl({
		url: ['https://example.com'],
		apiKey: process.env.CRAWL_KEY!, // Spider.cloud API key
		maxPages: 5
	});

	console.log('Crawled content:', results);
}

main();

Response

  • Name
    ToolCrawlResponse[]
    Type
    Array<object>
    Description

    An array of objects containing the URL and the extracted content returned by the langbase.tools.crawl() function.

    ToolCrawlResponse Type

    interface ToolCrawlResponse {
    	url: string;
    	content: string;
    }
    
    • Name
      url
      Type
      string
      Description

      The URL of the crawled page.

    • Name
      content
      Type
      string
      Description

      The extracted content from the crawled page.

ToolCrawlResponse Example

[
	{
		"url": "https://example.com/page1",
		"content": "Extracted content from the webpage..."
	},
	{
		"url": "https://example.com/page2",
		"content": "More extracted content..."
	}
]