Vision MCP (TAU)

TAU exposes built-in Vision MCP tools via the builtin MCP server, allowing for direct visual understanding of images and web pages.

Available Tools

understand_image: Analyze a single image
understand_images: Analyze multiple images (e.g., for comparison)
vision_describe: Alias for understand_image

Requirements

To use vision capabilities, you must configure at least one vision-capable provider via environment variables:

ANTHROPIC_API_KEY (Claude 3.5 Sonnet / Opus)
OPENAI_API_KEY (GPT-4o)

Examples

Describe a browser screenshot

Combine the browser and vision tools to analyze web pages visually:

tau mcp install browser
tau mcp call builtin browser_open --args '{"url":"https://example.com"}'
tau mcp call builtin browser_screenshot_describe --args '{"prompt":"What is on this page?"}'

Describe a local image file

The CLI handles file reading automatically:

tau mcp call builtin file_describe_image --args '{"path":"./image.png","prompt":"What is shown here?"}'

Describe a single image (Base64)

For manual API usage or piping:

IMG_B64="$(base64 < image.png | tr -d '
')"
tau mcp call builtin understand_image \
  --args "{\"image_base64\":\"$IMG_B64\",\"mime_type\":\"image/png\",\"prompt\":\"What is shown here?\"}"

Compare multiple images

IMG1="$(base64 < a.png | tr -d '
')"
IMG2="$(base64 < b.png | tr -d '
')"
tau mcp call builtin understand_images \
  --args "{\"images\":[{\"image_base64\":\"$IMG1\",\"mime_type\":\"image/png\"},{\"image_base64\":\"$IMG2\",\"mime_type\":\"image/png\"}],\"prompt\":\"Compare these two screenshots\"}"

Parameters

Optional overrides for fine-tuning:

provider: "anthropic" or "openai"
model: Provider-specific model ID (e.g., claude-3-5-sonnet-20241022)
max_output_tokens: Integer cap for output length