Crawler langbase.tools.crawl()
You can use the tools.crawl()
function to extract content from web pages. This is particularly useful when you need to gather information from websites for your AI applications.
The crawling functionality is powered by the following services:
Pre-requisites
- Langbase API Key: Generate your API key from the User/Org API key documentation.
- Crawl API Key: Sign up at Spider.cloud OR Firecrawl to get your crawl API key.
API reference
langbase.tools.crawl(options)
Crawl web pages by running the langbase.tools.crawl()
function.
Function Signature
langbase.tools.crawl(options);
// with types
langbase.tools.crawl(options: ToolCrawlOptions);
options
- Name
options
- Type
- ToolCrawlOptions
- Description
ToolCrawlOptions Object
interface ToolCrawlOptions { url: string[]; apiKey: string; maxPages?: number; service?: 'spider' | 'firecrawl'; }
Following are the properties of the options object.
url
- Name
url
- Type
- string[]
- Required
- Required
- Description
An array of URLs to crawl. Each URL should be a valid web address.
apiKey
- Name
apiKey
- Type
- string
- Required
- Required
- Description
Your Spider.cloud API key – get one from Spider.cloud.
maxPages
- Name
maxPages
- Type
- number
- Description
Maximum number of pages to crawl. Limits crawl depth.
service
- Name
service
- Type
- string
- Description
The crawling service to use. Options are
spider
orfirecrawl
. Default isspider
.
Install the SDK
npm i langbase
Environment variables
.env file
LANGBASE_API_KEY="<USER/ORG-API-KEY>"
CRAWL_KEY="<CRAWL-API-KEY>"
langbase.tools.crawl()
examples
langbase.tools.crawl()
import { Langbase } from 'langbase';
const langbase = new Langbase({
apiKey: process.env.LANGBASE_API_KEY!,
});
async function main() {
const results = await langbase.tools.crawl({
url: ['https://example.com'],
apiKey: process.env.CRAWL_KEY!,
maxPages: 5
});
console.log('Crawled content:', results);
}
main();
Response
- Name
ToolCrawlResponse[]
- Type
- Array<object>
- Description
An array of objects containing the URL and the extracted content returned by the
langbase.tools.crawl()
function.ToolCrawlResponse Type
interface ToolCrawlResponse { url: string; content: string; }
- Name
url
- Type
- string
- Description
The URL of the crawled page.
- Name
content
- Type
- string
- Description
The extracted content from the crawled page.
ToolCrawlResponse Example
[
{
"url": "https://example.com/page1",
"content": "Extracted content from the webpage..."
},
{
"url": "https://example.com/page2",
"content": "More extracted content..."
}
]