Vision MCP (TAU)
TAU exposes built-in Vision MCP tools via the builtin MCP server, allowing for direct visual understanding of images and web pages.
Available Tools
understand_image: Analyze a single imageunderstand_images: Analyze multiple images (e.g., for comparison)vision_describe: Alias for understand_image
Requirements
To use vision capabilities, you must configure at least one vision-capable provider via environment variables:
ANTHROPIC_API_KEY(Claude 3.5 Sonnet / Opus)OPENAI_API_KEY(GPT-4o)
Examples
Describe a browser screenshot
Combine the browser and vision tools to analyze web pages visually:
tau mcp install browser
tau mcp call builtin browser_open --args '{"url":"https://example.com"}'
tau mcp call builtin browser_screenshot_describe --args '{"prompt":"What is on this page?"}'Describe a local image file
The CLI handles file reading automatically:
tau mcp call builtin file_describe_image --args '{"path":"./image.png","prompt":"What is shown here?"}'Describe a single image (Base64)
For manual API usage or piping:
IMG_B64="$(base64 < image.png | tr -d '
')"
tau mcp call builtin understand_image \
--args "{\"image_base64\":\"$IMG_B64\",\"mime_type\":\"image/png\",\"prompt\":\"What is shown here?\"}"Compare multiple images
IMG1="$(base64 < a.png | tr -d '
')"
IMG2="$(base64 < b.png | tr -d '
')"
tau mcp call builtin understand_images \
--args "{\"images\":[{\"image_base64\":\"$IMG1\",\"mime_type\":\"image/png\"},{\"image_base64\":\"$IMG2\",\"mime_type\":\"image/png\"}],\"prompt\":\"Compare these two screenshots\"}"Parameters
Optional overrides for fine-tuning:
provider:"anthropic"or"openai"model: Provider-specific model ID (e.g.,claude-3-5-sonnet-20241022)max_output_tokens: Integer cap for output length