Run Agent with Vision

This example demonstrates how to run a Langbase agent that can analyze visual content using image inputs. The agent uses a vision-capable model to understand and respond to user prompts involving images — such as extracting text from a photo.


Run Agent with Vision Example

Run Agent with Vision Example

import 'dotenv/config';
import { Langbase } from 'langbase';

const langbase = new Langbase({
    apiKey: process.env.LANGBASE_API_KEY!,
});

async function main() {
    const response = await langbase.agent.run({
        model: 'openai:gpt-4o-mini',
        input: [
            {
                role: 'user',
                content: [
                    {
                        type: 'text',
                        text: 'Extract the text from this image',
                    },
                    {
                        type: 'image_url',
                        image_url: {
                            url: 'https://upload.wikimedia.org/wikipedia/commons/a/a7/Handwriting.png',
                        },
                    },
                ],
            },
        ],
        apiKey: process.env.LLM_API_KEY!,
        stream: false,
    });

    console.log('Vision Agent Response:', response);
}

main();