Run Agent with Vision
This example demonstrates how to run a Langbase agent that can analyze visual content using image inputs. The agent uses a vision-capable model to understand and respond to user prompts involving images — such as extracting text from a photo.
Run Agent with Vision Example
Run Agent with Vision Example
import 'dotenv/config';
import { Langbase } from 'langbase';
const langbase = new Langbase({
apiKey: process.env.LANGBASE_API_KEY!,
});
async function main() {
const response = await langbase.agent.run({
model: 'openai:gpt-4o-mini',
input: [
{
role: 'user',
content: [
{
type: 'text',
text: 'Extract the text from this image',
},
{
type: 'image_url',
image_url: {
url: 'https://upload.wikimedia.org/wikipedia/commons/a/a7/Handwriting.png',
},
},
],
},
],
apiKey: process.env.LLM_API_KEY!,
stream: false,
});
console.log('Vision Agent Response:', response);
}
main();