Guide: How to use Vision
A step-by-step guide to using Vision capabilities from a Vision model using Langbase Pipes.
In this guide, we will learn how to send an image to a Vision model in a Langbase Pipe and get it to answer questions about it.
What is Vision?
LLM models with vision capabilities can take images as input, understand them, and generate text-based answers about them. Vision models can be used to answer questions about images, generate captions, or provide descriptions of visual content. Vision is also used for OCR tasks, like image classification, and object detection.
LLM Type | Input | Output |
---|---|---|
Unimodal without Vision | Text | Text |
Multimodal with Vision | Text + Image | Text |
How to use Vision in Langbase Pipes?
Vision is supported in Langbase Pipes across different LLM providers, including OpenAI, Anthropic, Google, and more. Using Vision in Langbase Pipes is simple. You can send images in the API request and get text answers about them.
Sending Images to Pipe for Vision
First, select a Vision model that supports image input in your Langbase Pipe. You can choose from a variety of Vision models from different LLM providers. For example, OpenAI's gpt-4o
or Anthropic's claude-3.5-sonnet
.
The Pipe Run API matches the OpenAI spec for Vision requests. When running the pipe, provide the image in a message inside the messages
array.
Here is what your messages will look like for vision requests:
// Pipe Run API
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is in this image?"
}
{
"type": "image_url",
"image_url": {
"url": "...xyz" // base64 encoded image
}
}
]
}
]
In the above example, we are sending an image URL (base64 encoded image) as input to the vision model pipe, which will process the image and give a text response.
Follow the Run Pipe API spec for detailed request types.
Image Input Guidelines for Vision
Here are some considerations when using vision in Langbase Pipes:
-
Message Format
- Images can be passed in
user
role messages. - Message
content
must be an array of content parts (for text and images) in vision requests. While in text-only requests, the messagecontent
is a string.
- Images can be passed in
-
Image URL
- The
image_url
field is used to pass the image URL, which can be:- Base64 encoded images: Supported by all providers.
- Public URLs Supported only by OpenAI.
- The
-
Provider-specific limits
- Different LLM providers may impose varying restrictions on image size, format, and the number of images per request.
- Refer to the specific provider’s documentation for precise limits.
- Langbase imposes no additional restrictions.
-
Image Quality Settings (OpenAI only)
- OpenAI models support an optional detail field in the image_url object for controlling image quality.
- The
detail
field can be set tolow
,medium
, orhigh
to control the quality of the image sent to the model.
Examples
Here are some example Pipe Run requests utilizing Vision models in Langbase Pipes.
Example 1: Sending a Base64 Image
Here is an example of sending a base64 image in a Pipe Run API request.
curl https://api.langbase.com/v1/pipes/run \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <YOUR_PIPE_API_KEY>' \
-d '{
"stream": false,
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image."
},
{
"type": "image_url",
"image_url": {
{/* An example image of colorful squares */}
"url": ""
}
}
]
}
]
}'
Example 2: Sending Image as a Public Image URL (supported by OpenAI only)
Public image URLs are only supported by OpenAI, so make sure you are using an OpenAI model.
curl https://api.langbase.com/v1/pipes/run \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <YOUR_PIPE_API_KEY>' \
-d '{
"stream": false,
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image."
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/b/b5/Iridescent.green.sweat.bee1.jpg"
}
}
]
}
]
}'
Example 3: Sending multiple images
You can also send multiple images attached to the same message.
curl https://api.langbase.com/v1/pipes/run \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <YOUR_PIPE_API_KEY>' \
-d '{
"stream": false,
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "How are these images different?"
},
{
"type": "image_url",
"image_url": {
"url": "<image_1_base64>"
}
},
{
"type": "image_url",
"image_url": {
"url": "<image_2_base64>"
}
}
]
}
]
}'
Replace <image_1_base64>
and <image_2_base64>
with the base64 encoded images you want to send.
Example 4: Sending multiple images in conversation turns (chat)
You can also send multiple images in different messages across conversation turns.
Let's say you start the conversation with the first image:
curl https://api.langbase.com/v1/pipes/run \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <YOUR_PIPE_API_KEY>' \
-d '{
"stream": false,
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image."
},
{
"type": "image_url",
"image_url": {
"url": "<image_1_base64>"
}
}
]
}
]
}'
Then, in the next turn, you can send the second image:
curl https://api.langbase.com/v1/pipes/run \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <YOUR_PIPE_API_KEY>' \
-d '{
"stream": false,
"thread_id": "<thread_id>",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Is this image different from the previous one?"
},
{
"type": "image_url",
"image_url": {
"url": "<image_2_base64>"
}
}
]
}
]
}'
By including the thread_id
returned from the first request, in the second request, your Langbase Pipe automatically continues the conversation from the previous turn.
FAQs
- Make sure to use the correct Vision model that supports image input.
- Langbase currently supports Vision models from OpenAI, Anthropic and Google. More providers will be supported soon.
- Vision support is live on Langbase API. Vision in Studio playground is coming soon.
- Langbase currently does not store images sent to Vision models.