# Vision

Vision allows your worker to read and understand images, as well as capture and interpret the contents of webpages. Toolhouse adds these capabilities automatically if your worker needs them to complete its task. You can also add it manually.

### How Vision works

You can create a worker that needs to see and interpret visual content. For example, you can create a customer support worker that monitors your website for you. You want the worker to detect changes in your UI and summarize what it sees on each page.

In this case, Toolhouse detects the worker will need to read images and it will add Vision to it.

### Setting up how your worker interprets images

To specify how the worker should describe or analyze images, edit your agent in Agent Editor and tell the editor something like: "I want my worker to always describe images in detail, focusing on any text, charts, or key visual elements it finds." You can use natural language and Toolhouse will translate it into proper prompt instructions for you.

### Read an image and act on it

Your worker can use Vision alongside other integrations-- for example, pairing it with Image Generation to first read an existing image and then produce a modified or inspired version of it.

### Adding Vision manually

* Go to **Agents** in your Toolhouse
* Click on your worker to edit it
* Select **Integrations**, then click **Add Integration**
* Choose **Describe Image** to read the contents of an image
* Choose **Page Screenshot** to take a screenshot of a webpage and read its contents
* Click **Save changes**


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.toolhouse.ai/toolhouse/capabilites/vision.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
