mcp-ai-vision-analyzer
Leverage sophisticated artificial intelligence models to perform in-depth visual data interpretation and feature extraction, significantly augmenting the analytical capabilities of connected AI agents. Features an intuitive user interface for formulating precise, tailored image examination requests.
Author

Nazruden
Quick Info
Actions
Tags
MCP AI Vision Analyzer
Overview
This component, adhering to the Model Context Protocol (MCP) specification, functions as a server to facilitate image processing via cutting-edge vision paradigms sourced through OpenRouter endpoints. It grants AI assistants the facility to conduct detailed visual scrutiny through a streamlined interface within the existing MCP framework.
Deployment
Automated Deployment with Smithery
To integrate mcp-ai-vision-analyzer seamlessly with Claude Desktop utilizing Smithery:
npx -y @smithery/cli install @Nazruden/mcp-openvision --client claude
Installation via pip
pip install mcp-openvision
Recommended Installer (uv)
uv pip install mcp-openvision
Configuration Prerequisites
The system mandates an active OpenRouter authentication token and can be configured using environmental parameters:
- OPENROUTER_API_KEY (mandatory): Your valid credential for OpenRouter access.
- OPENROUTER_DEFAULT_MODEL (optional): Specifies the primary vision engine to utilize.
Supported OpenRouter Visual Engines
mcp-ai-vision-analyzer is interoperable with any OpenRouter endpoint supporting visual input streams. The default selection is qwen/qwen2.5-vl-32b-instruct:free, but users may override this.
Exemplary vision models accessible via OpenRouter:
qwen/qwen2.5-vl-32b-instruct:free(The default)anthropic/claude-3-5-sonnetanthropic/claude-3-opusanthropic/claude-3-sonnetopenai/gpt-4o
You customize the employed engine either by setting the OPENROUTER_DEFAULT_MODEL environment variable or by passing the model argument within the analyze_visual_data function call.
Operational Use Cases
Quick Verification with MCP Inspector
Executing a diagnostic check using the MCP Inspector utility:
npx @modelcontextprotocol/inspector uvx mcp-openvision
Integration with Desktop Clients (Claude/Cursor)
Modify your MCP configuration file as follows:
- Windows:
%USERPROFILE%\.cursor\mcp.json - macOS:
~/.cursor/mcp.jsonor~/Library/Application Support/Claude/claude_desktop_config.json
Inject the subsequent configuration block:
{
"mcpServers": {
"openvision": {
"command": "uvx",
"args": ["mcp-openvision"],
"env": {
"OPENROUTER_API_KEY": "your_openrouter_api_key_here",
"OPENROUTER_DEFAULT_MODEL": "anthropic/claude-3-sonnet"
}
}
}
}
Local Execution for Development
# Establish the necessary API token
export OPENROUTER_API_KEY="your_api_key"
# Initiate the server module directly
python -m mcp_openvision
Core Capabilities
mcp-ai-vision-analyzer exposes the subsequent primary utility:
- analyze_visual_data: Scrutinize pictorial inputs utilizing designated visual engines, accepting diverse parameter inputs:
image: Input modalities accepted:- Base64 byte representations
- Network Uniform Resource Locators (http/https)
- Local filesystem references
query: The natural language directive guiding the visual analysis task.system_prompt: Contextual directives establishing the operational persona and constraints for the processing model (optional).model: Specification of the vision processing unit to employ.temperature: Stochasticity control variable (range 0.0 to 1.0).max_tokens: The ceiling for the resultant output length.
Optimizing Analysis Directives
The query argument is paramount for deriving meaningful conclusions from the visual assets. A well-structured directive must articulate:
- Objective: The fundamental goal of the visual interpretation.
- Areas of Interest: Explicit pointers to elements or regions demanding focused attention.
- Information Requirements: The precise nature of the data expected to be synthesized.
- Output Formatting: Preferred structure or arrangement for the final results.
Illustrative Examples of High-Fidelity Directives
| Simple Directive | Advanced Directive |
|---|---|
| "Summarize the visual content" | "Catalog every identifiable piece of merchandise within this point-of-sale snapshot and project estimated unit costs." |
| "What is depicted?" | "Examine this radiological image for anomalous formations, prioritizing potential pathology identification based on clinical markers." |
| "Data extraction from graph" | "Quantify the discrete data points presented in this time-series visualization detailing revenue fluctuations across fiscal quarters 2022-2023, and characterize dominant growth trajectories." |
| "Read text present" | "Perform comprehensive optical character recognition (OCR) on the embedded signage, preserving all textual entries, layout hierarchy, and associated annotations." |
By furnishing context regarding the analytical necessity and the specific informational yield sought, you significantly enhance the model's ability to concentrate on pertinent features and yield more actionable intelligence.
Operational Code Snippets
# Process an image referenced by a URL
analysis_output = await analyze_visual_data(
image="https://example.com/visual_asset.png",
query="Provide a comprehensive narrative description of the presented scene."
)
# Process a locally stored file with a highly focused analytical mandate
analysis_output = await analyze_visual_data(
image="disk/path/to/diagram.png",
query="Pinpoint every regulatory marking on this infrastructure diagram and elaborate on their compliance implications for civil engineers."
)
# Process an image encoded in Base64 with a specific design review purpose
analysis_output = await analyze_visual_data(
image="SGVsbG8gV29ybGQ=...", # Base64 payload
query="Critically assess the ergonomics and aesthetic appeal of the visible hardware interface, suggesting modifications to enhance user experience metrics."
)
# Invoke specialized analysis using an explicit guidance prompt
analysis_output = await analyze_visual_data(
image="disk/path/to/artwork.jpg",
query="Deconstruct the use of perspective and chiaroscuro in this canvas, and relate its execution style to Renaissance conventions.",
system_prompt="You operate as a seasoned curator specializing in pre-modern European painting. Your response must strictly adhere to formal art historical terminology regarding composition, technique, and attribution likelihood."
)
Image Input Modalities
The analyze_visual_data function accommodates three distinct formats for image conveyance:
- Base64 Binary Sequences
- Network References - Must initiate with
http://orhttps://protocol designators. - Filesystem Pointers:
- Absolute Pointers: Full hierarchical paths commencing with
/(POSIX) or a drive designation (Windows). - Relative Pointers: Paths interpreted relative to the server's current executing directory.
- Relative Pointers with Root Context: Utilize the optional
project_rootparameter to define an explicit base directory for path resolution.
Navigating Relative File Access
When referencing files using relative syntax (e.g., "assets/diagram.png"), resolution adheres to one of two conventions:
- The path is resolved against the directory where the server process is currently active.
- Alternatively, a
project_rootcontext parameter can be supplied:
# Illustration using a relative file path alongside a defined project base directory
analysis_output = await analyze_visual_data(
image="asset_files/layout.png",
project_root="/data/project_sources",
query="What components are present in this schematic view?"
)
This facility is invaluable in environments where the invocation directory lacks predictability or when referencing resources relative to a stable, designated project foundation.
Development Lifecycle
Establishing the Development Environment
# Clone the source repository
git clone https://github.com/modelcontextprotocol/mcp-openvision.git
cd mcp-openvision
# Install dependencies necessary for development
pip install -e ".[dev]"
Code Style Enforcement
The project enforces consistent coding standards via Black for automatic formatting. This standard is maintained through continuous integration pipelines:
- All code committed to the repository undergoes automatic Black application.
- For contributions originating from external forks, Black processes the code and commits the formatted result directly onto the Pull Request branch.
- For contributors with write access, Black formats code upon commit enforcement.
You can manually invoke the formatter locally prior to committing changes:
# Apply formatting across Python files in the src and tests hierarchies
black src tests
Executing Unit and Integration Tests
pytest
Release Orchestration
This project utilizes an automated procedure for version releases:
- Update the version identifier within
pyproject.tomlin adherence to Semantic Versioning guidelines. - A helper utility is provided:
python scripts/bump_version.py [major|minor|patch] - Document the changes pertaining to the new version within
CHANGELOG.md. - The script mentioned above also generates a placeholder section in
CHANGELOG.mdfor documentation. - Commit these modifications and push them to the primary branch (
main). - The GitHub Actions pipeline will subsequently:
- Recognize the version increment.
- Automatically generate a formal GitHub Release entry.
- Initiate the deployment pipeline responsible for publishing to the PyPI repository.
This automation ensures rigid adherence to versioning protocols and comprehensive release documentation.
Community Support
If this utility proves beneficial to your work, consider offering a small token of appreciation via the designated support link to sustain ongoing development and necessary upkeep.
Governance and Licensing
This software is distributed under the terms of the MIT License; consult the LICENSE file for exhaustive details.
WIKIPEDIA: Cloud computing is "a paradigm for enabling network access to a scalable and elastic pool of shareable physical or virtual resources with self-service provisioning and administration on-demand," according to ISO. It is commonly referred to as "the cloud".
== Operational Attributes == In 2011, the United States National Institute of Standards and Technology (NIST) codified five "essential attributes" defining cloud infrastructure. The precise NIST definitions are enumerated below:
On-demand self-service: "A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service provider." Broad network access: "Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, tablets, laptops, and workstations)." Resource pooling: " The provider's computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand." Rapid elasticity: "Capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly outward and inward commensurate with demand. To the consumer, the capabilities available for provisioning often appear unlimited and can be appropriated in any quantity at any time." Measured service: "Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service. By 2023, the International Organization for Standardization (ISO) had expanded and refined the list.
== Historical Context ==
The conceptual lineage of cloud computation stretches back to the 1960s, marked by the maturation of time-sharing concepts popularized through Remote Job Entry (RJE). The prevailing operational paradigm during this epoch involved the "data center" construct, where users submitted workloads to dedicated operators for execution on mainframe systems. This era was defined by intense investigation into mechanisms to democratize access to immense computational power via time-slicing, striving for optimized utilization across infrastructure, platform layers, and application execution, thereby maximizing end-user efficiency. The symbolic representation of virtualized services as a "cloud" dates to 1994, employed by General Magic to depict the expansive cosmos of "destinations" accessible by mobile agents within their Telescript environment. This visualization is attributed to David Hoffman, a specialist in communications at General Magic, borrowing from established conventions in telecommunications and network schematic drawing. The phrase "cloud computing" gained significant traction in 1996 following the circulation of a strategic business projection by Compaq Computer Corporation concerning future computational models and the Internet. The organization's aspiration was to superch
