Document parser
Document Parser
Document Parser lets your AI workers fetch a file from a URL and read it as clean, structured Markdown. It works like File download, but instead of handing the raw file to your worker, Toolhouse converts it first — making the contents far easier for your worker to interpret and act on. Toolhouse adds Document Parser automatically when your worker needs it, or you can add it manually.
How Document Parser works
When your worker needs to read a document — whether you provided the URL or the worker found it through Web Search — it passes the URL to Document Parser. Toolhouse fetches the file and converts its contents to Markdown on a best-effort basis before passing the result to your worker.
This means your worker receives clean, readable text with structure preserved — headings, lists, tables, and paragraphs — rather than raw bytes or noisy file output.
Example prompt for your worker:
"Parse the product spec sheet at this URL and list all technical requirements in a table."
When to use Document Parser vs. File Download
Document Parser is the right choice whenever the file you need to read has structure or formatting that matters — or whenever the file format is not natively readable as plain text.
PDF reports, whitepapers, or manuals
Document Parser
Scanned or image-heavy documents
Document Parser
HTML pages with complex layout
Document Parser
Plain text, CSV, JSON, XML
File Download
Binary files, archives, media
Neither — consider Virtual Computer
If you're unsure, prefer Document Parser. The Markdown conversion step is low-cost and makes the output significantly more useful to your worker.
Best-effort conversion
Document Parser converts files to Markdown on a best-effort basis. Most well-structured documents — PDFs with selectable text, HTML pages, Word documents — convert reliably. However, some files may not convert cleanly:
Scanned documents with no embedded text layer may produce poor output or none at all
Complex layouts such as multi-column documents or heavily formatted slides may lose some structure during conversion
Non-document files such as images or executables cannot be meaningfully converted
When conversion quality is critical, test with a sample document first. You can prompt your worker with:
"Parse this document and tell me if the output looks complete and well-structured before proceeding."
Context window limits
Like File Download, Document Parser limits the size of the converted output to fit within your worker's context window. The Markdown conversion often reduces file size compared to the raw original, which means Document Parser can sometimes surface more usable content than File Download for the same file.
If the document is very long and gets truncated, consider splitting the task — for example, asking your worker to parse and summarize one section at a time.
Using Document Parser with Virtual Computer
Document Parser and Virtual Computer complement each other well for document-heavy workflows. Document Parser converts the file into readable Markdown; Virtual Computer can then process that content programmatically — extracting structured data, running calculations, or transforming the output.
To configure this in Agent Editor:
"Parse the financial report at this URL using Document Parser, then use the virtual computer to extract all numeric values from the tables and compute year-over-year growth."
Adding Document Parser manually
Go to Agents in your Toolhouse dashboard
Click on your worker to edit it
Select Integrations, then click Add Integration
Choose Document Parser
Click Save changes
Limitations and gotchas
Best-effort conversion
Markdown conversion is not guaranteed for all file types. Scanned documents, image-heavy files, and unusual formats may not convert well.
Output size cap
Converted output is truncated to fit within your worker's context window. Very long documents may be cut off.
No binary support
Files that cannot be meaningfully converted to text — images, executables, archives — are not suited for Document Parser.
URL must be accessible
The file URL must be publicly reachable. Files behind authentication or private networks cannot be fetched.
Frequently asked questions
How is Document Parser different from File Download? File Download gives your worker the raw file contents as-is. Document Parser adds a conversion step that transforms the file into Markdown, making it far easier for your worker to read and reason about structured documents like PDFs or HTML pages.
Does Document Parser work on web pages, not just files? Yes. If the URL points to an HTML page, Document Parser will convert the page content to Markdown — stripping navigation, ads, and other boilerplate to surface the core content.
What if the conversion produces poor output? You can prompt your worker to flag quality issues before proceeding, or fall back to File Download for simple text files. For scanned documents with no text layer, Document Parser may not be the right tool — consider pre-processing the file with an OCR service before passing it to your worker.
Can my worker use Document Parser on a file it found itself? Yes. If your worker discovers a document URL through Web Search or another integration, it can pass that URL to Document Parser without any additional input from you.
Last updated