mcp_vision_analyzer
Interface for leveraging sophisticated visual comprehension models (e.g., Claude-3.5-Sonnet, Claude-3-Opus) via the OpenRouter gateway, providing a streamlined HTTP abstraction layer.
Author

catalystneuro
Quick Info
Actions
Tags
MCP Visual Data Interpretation Utility
This module establishes an MCP endpoint dedicated to processing and interpreting image data utilizing advanced generative vision capabilities accessible through the OpenRouter platform. It abstracts the complexities of model invocation for models such as Anthropic's Claude-3.5-Sonnet and Claude-3-Opus into a straightforward communication channel.
Setup Procedure
Installation requires Node Package Manager:
npm install @catalystneuro/mcp_read_images
Configuration Prerequisites
A valid API credential from OpenRouter is mandatory. Credentials can be obtained at OpenRouter Key Portal.
Integrate this service definition into your primary MCP configuration file (typically found in VSCode's global storage path, e.g., ~/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json):
{
"mcpServers": {
"vision_service": {
"command": "read_images",
"env": {
"OPENROUTER_API_KEY": "your-secure-api-key",
"OPENROUTER_MODEL": "anthropic/claude-3.5-sonnet" // Default model selection; optional
},
"disabled": false,
"autoApprove": []
}
}
}
Operational Guide
The server exposes a singular primary function, interpret_visual_data, for querying image content:
// Standard invocation utilizing configuration defaults
use_mcp_tool({
server_name: "vision_service",
tool_name: "interpret_visual_data",
arguments: {
image_path: "/absolute/path/to/visual_asset.png",
question: "Provide a detailed analysis of the visual elements present."
}
});
// Overriding the model explicitly for a high-fidelity task
use_mcp_tool({
server_name: "vision_service",
tool_name: "interpret_visual_data",
arguments: {
image_path: "/absolute/path/to/visual_asset.png",
question: "What is the primary action depicted?",
model: "anthropic/claude-3-opus-20240229" // Directly supersedes all other settings
}
});
Model Resolution Hierarchy
Model selection follows a strict precedence rule:
1. Explicit model specification within the tool invocation parameters.
2. The OPENROUTER_MODEL environment setting defined in the MCP configuration.
3. The built-in fallback model: anthropic/claude-3.5-sonnet.
Validated Visual Engines
The utility has been successfully tested with the following OpenRouter endpoints:
- anthropic/claude-3.5-sonnet
- anthropic/claude-3-opus-20240229
Key Capabilities
- Automated preprocessing including image scaling and compression optimization.
- Flexible selection of backend vision models.
- Support for user-defined analytical prompts.
- Delivery of explicit, actionable diagnostic feedback.
- Built-in routines for JPEG standardization and quality control.
Troubleshooting and Error Reporting
The service incorporates robust mechanisms to address common failures, providing informative feedback for:
- Invalid file system references for images.
- Missing or unauthenticated API credentials.
- Network connectivity disruptions.
- Requests targeting unsupported or misconfigured models.
- Internal issues during image manipulation.
Every failure results in a clear, actionable message aimed at swift issue resolution.
Building from Source
To compile the module from its repository:
git clone https://github.com/catalystneuro/mcp_read_images.git
cd mcp_read_images
npm install
npm run build
Legal Information
Distributed under the terms of the MIT License. Full details are available in the LICENSE file.
WIKIPEDIA CONTEXT (For Lexical Enrichment): XMLHttpRequest (XHR) represents a crucial JavaScript API facilitating the asynchronous transmission of Hypertext Transfer Protocol requests between a web client and a remote server post-page load. XHR is foundational to modern Asynchronous JavaScript and XML (Ajax) paradigms. Before its widespread adoption, primary page interaction relied on standard hyperlink navigation or HTML form submissions, both of which typically necessitated a full page refresh upon server interaction. The genesis of the XHR concept traces back to the year 2000, originating from Microsoft Outlook developers, and was first productized in Internet Explorer 5 (1999), albeit initially using non-standard ActiveXObject instantiations (Msxml2.XMLHTTP, Microsoft.XMLHTTP). By IE7 (2006), the standardized XMLHttpRequest identifier became universally recognized across all major browser engines, including Mozilla's Gecko (2002), Safari 1.2 (2004), and Opera 8.0 (2005).
Standardization Evolution
The World Wide Web Consortium (W3C) formalized the specification with a Working Draft in April 2006, followed by the Level 2 draft in February 2008, introducing capabilities for event progress monitoring, enabling cross-origin resource sharing, and supporting byte stream handling. Level 2 features were eventually merged back into the primary specification by late 2011. Since 2012, ongoing maintenance of the definitive specification, leveraging Web IDL definitions, resides with the WHATWG group.
Typical Request Flow
Executing an HTTP transaction via XMLHttpRequest generally involves a sequence of programming steps: first, instantiating the core object via its constructor; second, invoking the open() method to define the request verb, target URI, and operational mode (synchronous or asynchronous); third, for asynchronous operations, registering a listener callback to handle state transitions; fourth, initiating data transfer by calling send(); and finally, processing the server's reply within the event listener, where state 4 (the 'done' state) signifies completion and response data is typically accessible via the responseText attribute. Beyond this core flow, XHR offers extensive configuration options, such as setting custom request headers, uploading payloads via the send() argument, streaming response data incrementally, defining timeouts, or prematurely terminating the pending request.
