Tools: Crawl v1

The crawl API endpoint allows you to extract content from web pages. This is particularly useful when you need to gather information from websites for your AI applications.

The crawling functionality is powered by Spider.cloud.

Limitations

Maximum number of URLs per request: 100
Maximum crawl depth (pages): 50

Pre-requisites

Langbase API Key: Generate your API key from the User/Org API key documentation.
Spider.cloud API Key: Sign up at Spider.cloud to get your crawler API key.

POST/v1/tools/crawl

Crawl web pages

Extract content from web pages by sending URLs to the crawl API endpoint.

Headers

Name
Content-Type
Type
string
Required
Required
Description
Request content type. Needs to be application/json.
Name
Authorization
Type
string
Required
Required
Description
Replace <YOUR_API_KEY> with your user/org API key.
Name
LB-CRAWL-KEY
Type
string
Required
Required
Description
Your Spider.cloud API key – obtain one from Spider.cloud. Replace YOUR_SPIDER_CLOUD_API_KEY with your key.

Request Body

Name
url
Type
string[]
Required
Required
Description
An array of URLs to crawl. Each URL should be a valid web address. Maximum 100 URLs per request.
Name
maxPages
Type
number
Description
The maximum number of pages to crawl. This limits the depth of the crawl operation. Maximum value: 50.

Usage example

Install the SDK

npm i langbase

Environment variables

.env file

LANGBASE_API_KEY="<USER/ORG-API-KEY>"
CRAWL_KEY="<SPIDER-CLOUD-API-KEY>"

Crawl web pages

Crawling

POST

/v1/tools/crawl

import { Langbase } from 'langbase';

const langbase = new Langbase({
	apiKey: process.env.LANGBASE_API_KEY!,
});

async function main() {
	const results = await langbase.tools.crawl({
		url: ['https://example.com'],
		apiKey: process.env.CRAWL_KEY!, // Spider.cloud API key
		maxPages: 5
	});

	console.log('Crawled content:', results);
}

main();

Response

Name
ToolCrawlResponse[]
Type
Array<object>
Description
An array of objects containing the URL and the extracted content returned by the crawl operation.
Crawl API Response
interface ToolCrawlResponse { url: string; content: string; } type CrawlResponse = ToolCrawlResponse[];

Name
url
Type
string
Description
The URL of the crawled page.
Name
content
Type
string
Description
The extracted content from the crawled page.

API Response

[
  {
    "url": "https://example.com/page1",
    "content": "Extracted content from the webpage..."
  },
  {
    "url": "https://example.com/page2",
    "content": "More extracted content..."
  }
]

Tools API

Tools: Crawl v1

Limitations

Pre-requisites

Get your Langbase API key: User or Org

Crawl web pages

Headers

Request Body

Usage example

Install the SDK

Environment variables

.env file

Crawl web pages

Crawling

Response

Crawl API Response

API Response