Vision
How Vision works
Setting up how your worker interprets images
Read an image and act on it
Adding Vision manually
Last updated
Vision allows your worker to read and understand images, as well as capture and interpret the contents of webpages. Toolhouse adds these capabilities automatically if your worker needs them to complete its task. You can also add it manually.
You can create a worker that needs to see and interpret visual content. For example, you can create a customer support worker that monitors your website for you. You want the worker to detect changes in your UI and summarize what it sees on each page.
In this case, Toolhouse detects the worker will need to read images and it will add Vision to it.
To specify how the worker should describe or analyze images, edit your agent in Agent Editor and tell the editor something like: "I want my worker to always describe images in detail, focusing on any text, charts, or key visual elements it finds." You can use natural language and Toolhouse will translate it into proper prompt instructions for you.
Your worker can use Vision alongside other integrations-- for example, pairing it with Image Generation to first read an existing image and then produce a modified or inspired version of it.
Go to Agents in your Toolhouse
Click on your worker to edit it
Select Integrations, then click Add Integration
Choose Describe Image to read the contents of an image
Choose Page Screenshot to take a screenshot of a webpage and read its contents
Click Save changes
Last updated