Run Agent with Vision

This example demonstrates how to run a Langbase agent that can analyze visual content using image inputs. The agent uses a vision-capable model to understand and respond to user prompts involving images — such as extracting text from a photo.


Run Agent with Vision Example

Run Agent with Vision Example

import 'dotenv/config'; import { Langbase } from 'langbase'; const langbase = new Langbase({ apiKey: process.env.LANGBASE_API_KEY!, }); async function main() { const response = await langbase.agent.run({ model: 'openai:gpt-4o-mini', input: [ { role: 'user', content: [ { type: 'text', text: 'Extract the text from this image', }, { type: 'image_url', image_url: { url: 'https://upload.wikimedia.org/wikipedia/commons/a/a7/Handwriting.png', }, }, ], }, ], apiKey: process.env.LLM_API_KEY!, stream: false, }); console.log('Vision Agent Response:', response); } main();