Crawler langbase.tools.crawl()
You can use the tools.crawl()
function to extract content from web pages. This is particularly useful when you need to gather information from websites for your AI applications. The crawling functionality is powered by Spider.cloud, and you'll need to obtain an API key from them to use this feature.
Pre-requisites
- Langbase API Key: Generate your API key from the User/Org API key documentation.
- Spider.cloud API Key: Sign up at Spider.cloud to get your crawler API key.
API reference
langbase.tools.crawl(options)
Crawl web pages by running the langbase.tools.crawl()
function.
Function Signature
langbase.tools.crawl(options);
// with types
langbase.tools.crawl(options: ToolCrawlOptions);
options
- Name
options
- Type
- ToolCrawlOptions
- Description
ToolCrawlOptions Object
interface ToolCrawlOptions { url: string[]; apiKey: string; maxPages?: number; }
Following are the properties of the options object.
url
- Name
url
- Type
- string[]
- Required
- Required
- Description
An array of URLs to crawl. Each URL should be a valid web address.
apiKey
- Name
apiKey
- Type
- string
- Required
- Required
- Description
Your Spider.cloud API key – get one from Spider.cloud.
maxPages
- Name
maxPages
- Type
- number
- Description
The maximum number of pages to crawl. This limits the depth of the crawl operation.
Install the SDK
npm i langbase
Environment variables
.env file
LANGBASE_API_KEY="<USER/ORG-API-KEY>"
CRAWL_KEY="<SPIDER-CLOUD-API-KEY>"
langbase.tools.crawl()
examples
langbase.tools.crawl()
import { Langbase } from 'langbase';
const langbase = new Langbase({
apiKey: process.env.LANGBASE_API_KEY!,
});
async function main() {
const results = await langbase.tools.crawl({
url: ['https://example.com'],
apiKey: process.env.CRAWL_KEY!, // Spider.cloud API key
maxPages: 5
});
console.log('Crawled content:', results);
}
main();
Response
- Name
ToolCrawlResponse[]
- Type
- Array<object>
- Description
An array of objects containing the URL and the extracted content returned by the
langbase.tools.crawl()
function.ToolCrawlResponse Type
interface ToolCrawlResponse { url: string; content: string; }
- Name
url
- Type
- string
- Description
The URL of the crawled page.
- Name
content
- Type
- string
- Description
The extracted content from the crawled page.
ToolCrawlResponse Example
[
{
"url": "https://example.com/page1",
"content": "Extracted content from the webpage..."
},
{
"url": "https://example.com/page2",
"content": "More extracted content..."
}
]