mcp-visual-analysis-service
Harnesses advanced machine perception methodologies to interpret and generate textual representations of supplied visual media. It accommodates diverse file structures and offers supplementary optical character recognition (OCR) capabilities.
Author

mario-andreschak
Quick Info
Actions
Tags
MCP Visual Interpretation Engine
This MCP service facilitates high-fidelity image comprehension leveraging cutting-edge vision models from both Anthropic and OpenAI. Current iteration: v0.1.2.
Core Functionalities
- Image summarization powered by either Anthropic Claude Vision or OpenAI GPT-4 Vision.
- Native compatibility across common image encodings (JPEG, PNG, GIF, WebP).
- User-defined primary processing engine and a secondary failover engine.
- Acceptance of input data via direct Base64 encoding or filesystem path references.
- Optional integrated text extraction using Tesseract for glyph recognition.
Prerequisites
- Operational environment requires Python version 3.8 or later.
- Tesseract OCR utility (optional dependency for text extraction):
- Windows Users: Acquire installer from UB-Mannheim/tesseract.
- Linux Deployments: Execute installation command:
sudo apt-get install tesseract-ocr. - macOS Setup: Utilize Homebrew:
brew install tesseract.
Deployment Procedure
-
Secure the codebase via Git: bash git clone https://github.com/mario-andreschak/mcp-image-recognition.git cd mcp-image-recognition
-
Initialize configuration using the template: bash cp .env.example .env
Modify .env to set your required API credentials and operational parameters
- Execute the build routine: bash build.bat
Operational Commands
Server Initialization
Launch the service via standard Python execution: bash python -m image_recognition_server.server
Alternatively, use the provided batch script: bash run.bat server
For debugging, start in inspection mode with the MCP Inspector utility: bash run.bat debug
Exposed Function Interfaces
describe_image- Input Parameters: Image data encoded in Base64 format and its corresponding MIME type identifier.
-
Output: A comprehensive, detailed narrative describing the input visual asset.
-
describe_image_from_file - Input Parameters: Absolute or relative filesystem location pointing to the image file.
- Output: A thorough descriptive summary of the visual content.
Configuration Variables (Environment Setup)
ANTHROPIC_API_KEY: Essential authentication token for Anthropic services.OPENAI_API_KEY: Essential authentication token for OpenAI services.VISION_PROVIDER: Specifies the primary computational engine (anthropicoropenai).FALLBACK_PROVIDER: Secondary engine to engage upon primary failure.LOG_LEVEL: Controls verbosity of system output (e.g., DEBUG, INFO, WARNING, ERROR).ENABLE_OCR: Boolean flag (trueorfalse) to activate Tesseract text extraction.TESSERACT_CMD: Optional override for the executable path of Tesseract.OPENAI_MODEL: The chosen model identifier (default isgpt-4o-mini). Supports OpenRouter specification syntax (e.g.,anthropic/claude-3.5-sonnet:beta).OPENAI_BASE_URL: Alternate endpoint URL for OpenAI-compatible requests. Set this tohttps://openrouter.ai/api/v1to utilize OpenRouter.OPENAI_TIMEOUT: Custom duration setting (in seconds) for API call timeouts.
Integration via OpenRouter
OpenRouter facilitates access to a broader model catalog using the standard OpenAI request structure. To enable this path:
- Secure an API credential set from the OpenRouter platform.
- Assign this credential to the
OPENAI_API_KEYvariable within your.envfile. - Reconfigure the endpoint URL by setting
OPENAI_BASE_URLtohttps://openrouter.ai/api/v1. - Designate the intended processing model via
OPENAI_MODELusing its OpenRouter nomenclature (e.g.,anthropic/claude-3.5-sonnet:beta). - Ensure
VISION_PROVIDERis explicitly set toopenai.
Default Model Selections
- Anthropic Default:
claude-3.5-sonnet-beta - OpenAI Default:
gpt-4o-mini - OpenRouter Configuration: Employ the
anthropic/claude-3.5-sonnet:betasyntax within theOPENAI_MODELsetting.
Development Workflow
Running Verification Routines
Execute all defined automated tests: bash run.bat test
Execute targeted test modules: bash run.bat test server run.bat test anthropic run.bat test openai
Containerization Support
Generate the necessary Docker image artifact: bash docker build -t mcp-image-recognition .
Instantiate the service within a container instance: bash docker run -it --env-file .env mcp-image-recognition
Licensing Information
Distributed under the terms of the MIT License. Consult the LICENSE file for specifics.
Revision Chronicle
- 0.1.2 (2025-02-20): Enhanced robustness in OCR error handling procedures and integrated extensive unit testing for text extraction features.
- 0.1.1 (2025-02-19): Implemented Tesseract OCR capability for on-demand textual content extraction from visual inputs (optional feature).
- 0.1.0 (2025-02-19): Initial production deployment featuring core visual analysis integration for both Anthropic and OpenAI platforms.
WIKIPEDIA: XMLHttpRequest (XHR) constitutes an Application Programming Interface structured as a JavaScript object, enabling the transmission of HTTP communications from a running web browser instance to a designated web server. These methods empower browser-based applications to dispatch queries to the backend subsequent to page loading completion, and subsequently retrieve resultant data. XMLHttpRequest is foundational to the concept of Ajax programming methodologies. Preceding the advent of Ajax, standard navigational hyperlinks and form submissions were the principal avenues for server interaction, frequently resulting in a complete page reload.
== Background ==
The foundational concept underlying XMLHttpRequest was formulated in the year 2000 by the development team responsible for Microsoft Outlook. This concept subsequently saw its initial implementation within the Internet Explorer 5 browser release (1999). However, the initial method signature did not employ the standardized XMLHttpRequest identifier. Instead, developers relied upon constructor calls such as ActiveXObject("Msxml2.XMLHTTP") and ActiveXObject("Microsoft.XMLHTTP"). As of the release of Internet Explorer 7 (2006), the standardized XMLHttpRequest identifier achieved universal support across all major browser engines.
The XMLHttpRequest identifier has since solidified its position as the prevailing standard across the spectrum of principal web browsers, inclusive of Mozilla's Gecko rendering engine (2002), Safari version 1.2 (2004), and Opera version 8.0 (2005).
=== Formal Specifications === The World Wide Web Consortium (W3C) formally issued a Working Draft specification detailing the XMLHttpRequest object on April 5, 2006. Subsequently, on February 25, 2008, the W3C published the Level 2 specification draft. Level 2 introduced enhanced methods designed to monitor the progress of ongoing events, facilitate requests across different security domains (cross-site requests), and manage binary byte streams effectively. By the conclusion of 2011, the specific features outlined in Level 2 were integrated back into the primary specification document. In late 2012, stewardship for ongoing development transitioned to the WHATWG, which maintains the document as a continuously evolving specification utilizing Web IDL definitions.
== Operational Flow == Generally, the procedure for dispatching a request utilizing XMLHttpRequest involves several distinct programming stages.
- Instantiation: A new XMLHttpRequest object is generated by invoking its constructor method:
- Configuration: The
open"method is invoked to define the transmission method (request type), specify the target endpoint resource, and mandate either synchronous or asynchronous execution mode: - Listener Setup (Asynchronous Mode): For operations configured for asynchronous execution, an event listener must be attached to monitor state transitions:
- Transmission Initiation: The request is sent across the network by calling the
send"method, optionally carrying payload data: - State Monitoring: The application reacts to state changes through the established event listener. Upon successful server processing, the state transitions to 4, designated as the "done" state, and the retrieved payload is typically accessible via the
responseTextproperty. Beyond these fundamental steps, XMLHttpRequest offers a range of advanced controls governing transmission behavior and response handling. Custom header fields can be injected into the outgoing request to influence server processing logic, and data payloads can be uploaded to the server within thesend"invocation. The incoming response can be automatically parsed from raw JSON into a native, usable JavaScript data structure, or it can be processed incrementally as data packets arrive, avoiding wait times for the complete message. Furthermore, the operation can be halted prematurely or configured to yield an error if completion is not achieved within a preset time limit.
== Inter-Domain Communication ==
During the initial developmental phase of the World Wide Web, limitations became apparent regarding the ability to initiate data transfer requests across distinct security domains, leading to what is known as the same-origin policy constraint.
