MCP AI Vision Analyzer

Overview

This component, adhering to the Model Context Protocol (MCP) specification, functions as a server to facilitate image processing via cutting-edge vision paradigms sourced through OpenRouter endpoints. It grants AI assistants the facility to conduct detailed visual scrutiny through a streamlined interface within the existing MCP framework.

Deployment

Automated Deployment with Smithery

To integrate mcp-ai-vision-analyzer seamlessly with Claude Desktop utilizing Smithery:

npx -y @smithery/cli install @Nazruden/mcp-openvision --client claude

Installation via pip

pip install mcp-openvision

Recommended Installer (uv)

uv pip install mcp-openvision

Configuration Prerequisites

The system mandates an active OpenRouter authentication token and can be configured using environmental parameters:

OPENROUTER_API_KEY (mandatory): Your valid credential for OpenRouter access.
OPENROUTER_DEFAULT_MODEL (optional): Specifies the primary vision engine to utilize.

Supported OpenRouter Visual Engines

mcp-ai-vision-analyzer is interoperable with any OpenRouter endpoint supporting visual input streams. The default selection is qwen/qwen2.5-vl-32b-instruct:free, but users may override this.

Exemplary vision models accessible via OpenRouter:

qwen/qwen2.5-vl-32b-instruct:free (The default)
anthropic/claude-3-5-sonnet
anthropic/claude-3-opus
anthropic/claude-3-sonnet
openai/gpt-4o

You customize the employed engine either by setting the OPENROUTER_DEFAULT_MODEL environment variable or by passing the model argument within the analyze_visual_data function call.

Operational Use Cases

Quick Verification with MCP Inspector

Executing a diagnostic check using the MCP Inspector utility:

npx @modelcontextprotocol/inspector uvx mcp-openvision

Integration with Desktop Clients (Claude/Cursor)

Modify your MCP configuration file as follows:

Windows: %USERPROFILE%\.cursor\mcp.json
macOS: ~/.cursor/mcp.json or ~/Library/Application Support/Claude/claude_desktop_config.json

Inject the subsequent configuration block:

{
  "mcpServers": {
    "openvision": {
      "command": "uvx",
      "args": ["mcp-openvision"],
      "env": {
        "OPENROUTER_API_KEY": "your_openrouter_api_key_here",
        "OPENROUTER_DEFAULT_MODEL": "anthropic/claude-3-sonnet"
      }
    }
  }
}

Local Execution for Development

# Establish the necessary API token
export OPENROUTER_API_KEY="your_api_key"

# Initiate the server module directly
python -m mcp_openvision

Core Capabilities

mcp-ai-vision-analyzer exposes the subsequent primary utility:

analyze_visual_data: Scrutinize pictorial inputs utilizing designated visual engines, accepting diverse parameter inputs:
image: Input modalities accepted:
- Base64 byte representations
- Network Uniform Resource Locators (http/https)
- Local filesystem references
query: The natural language directive guiding the visual analysis task.
system_prompt: Contextual directives establishing the operational persona and constraints for the processing model (optional).
model: Specification of the vision processing unit to employ.
temperature: Stochasticity control variable (range 0.0 to 1.0).
max_tokens: The ceiling for the resultant output length.

Optimizing Analysis Directives

The query argument is paramount for deriving meaningful conclusions from the visual assets. A well-structured directive must articulate:

Objective: The fundamental goal of the visual interpretation.
Areas of Interest: Explicit pointers to elements or regions demanding focused attention.
Information Requirements: The precise nature of the data expected to be synthesized.
Output Formatting: Preferred structure or arrangement for the final results.

Illustrative Examples of High-Fidelity Directives

Simple Directive	Advanced Directive
"Summarize the visual content"	"Catalog every identifiable piece of merchandise within this point-of-sale snapshot and project estimated unit costs."
"What is depicted?"	"Examine this radiological image for anomalous formations, prioritizing potential pathology identification based on clinical markers."
"Data extraction from graph"	"Quantify the discrete data points presented in this time-series visualization detailing revenue fluctuations across fiscal quarters 2022-2023, and characterize dominant growth trajectories."
"Read text present"	"Perform comprehensive optical character recognition (OCR) on the embedded signage, preserving all textual entries, layout hierarchy, and associated annotations."

By furnishing context regarding the analytical necessity and the specific informational yield sought, you significantly enhance the model's ability to concentrate on pertinent features and yield more actionable intelligence.

Operational Code Snippets

# Process an image referenced by a URL
analysis_output = await analyze_visual_data(
    image="https://example.com/visual_asset.png",
    query="Provide a comprehensive narrative description of the presented scene."
)

# Process a locally stored file with a highly focused analytical mandate
analysis_output = await analyze_visual_data(
    image="disk/path/to/diagram.png",
    query="Pinpoint every regulatory marking on this infrastructure diagram and elaborate on their compliance implications for civil engineers."
)

# Process an image encoded in Base64 with a specific design review purpose
analysis_output = await analyze_visual_data(
    image="SGVsbG8gV29ybGQ=...",  # Base64 payload
    query="Critically assess the ergonomics and aesthetic appeal of the visible hardware interface, suggesting modifications to enhance user experience metrics."
)

# Invoke specialized analysis using an explicit guidance prompt
analysis_output = await analyze_visual_data(
    image="disk/path/to/artwork.jpg",
    query="Deconstruct the use of perspective and chiaroscuro in this canvas, and relate its execution style to Renaissance conventions.",
    system_prompt="You operate as a seasoned curator specializing in pre-modern European painting. Your response must strictly adhere to formal art historical terminology regarding composition, technique, and attribution likelihood."
)

Image Input Modalities

The analyze_visual_data function accommodates three distinct formats for image conveyance:

Base64 Binary Sequences
Network References - Must initiate with http:// or https:// protocol designators.
Filesystem Pointers:
Absolute Pointers: Full hierarchical paths commencing with / (POSIX) or a drive designation (Windows).
Relative Pointers: Paths interpreted relative to the server's current executing directory.
Relative Pointers with Root Context: Utilize the optional project_root parameter to define an explicit base directory for path resolution.

Navigating Relative File Access

When referencing files using relative syntax (e.g., "assets/diagram.png"), resolution adheres to one of two conventions:

The path is resolved against the directory where the server process is currently active.
Alternatively, a project_root context parameter can be supplied:

# Illustration using a relative file path alongside a defined project base directory
analysis_output = await analyze_visual_data(
    image="asset_files/layout.png",
    project_root="/data/project_sources",
    query="What components are present in this schematic view?"
)

This facility is invaluable in environments where the invocation directory lacks predictability or when referencing resources relative to a stable, designated project foundation.

Development Lifecycle

Establishing the Development Environment

# Clone the source repository
git clone https://github.com/modelcontextprotocol/mcp-openvision.git
cd mcp-openvision

# Install dependencies necessary for development
pip install -e ".[dev]"

Code Style Enforcement

The project enforces consistent coding standards via Black for automatic formatting. This standard is maintained through continuous integration pipelines:

All code committed to the repository undergoes automatic Black application.
For contributions originating from external forks, Black processes the code and commits the formatted result directly onto the Pull Request branch.
For contributors with write access, Black formats code upon commit enforcement.

You can manually invoke the formatter locally prior to committing changes:

# Apply formatting across Python files in the src and tests hierarchies
black src tests

Executing Unit and Integration Tests

pytest

Release Orchestration

This project utilizes an automated procedure for version releases:

Update the version identifier within pyproject.toml in adherence to Semantic Versioning guidelines.
A helper utility is provided: python scripts/bump_version.py [major|minor|patch]
Document the changes pertaining to the new version within CHANGELOG.md.
The script mentioned above also generates a placeholder section in CHANGELOG.md for documentation.
Commit these modifications and push them to the primary branch (main).
The GitHub Actions pipeline will subsequently:
Recognize the version increment.
Automatically generate a formal GitHub Release entry.
Initiate the deployment pipeline responsible for publishing to the PyPI repository.

This automation ensures rigid adherence to versioning protocols and comprehensive release documentation.

Community Support

If this utility proves beneficial to your work, consider offering a small token of appreciation via the designated support link to sustain ongoing development and necessary upkeep.

Governance and Licensing

This software is distributed under the terms of the MIT License; consult the LICENSE file for exhaustive details.

WIKIPEDIA: Cloud computing is "a paradigm for enabling network access to a scalable and elastic pool of shareable physical or virtual resources with self-service provisioning and administration on-demand," according to ISO. It is commonly referred to as "the cloud".

== Operational Attributes == In 2011, the United States National Institute of Standards and Technology (NIST) codified five "essential attributes" defining cloud infrastructure. The precise NIST definitions are enumerated below:

On-demand self-service: "A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service provider." Broad network access: "Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, tablets, laptops, and workstations)." Resource pooling: " The provider's computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand." Rapid elasticity: "Capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly outward and inward commensurate with demand. To the consumer, the capabilities available for provisioning often appear unlimited and can be appropriated in any quantity at any time." Measured service: "Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service. By 2023, the International Organization for Standardization (ISO) had expanded and refined the list.

== Historical Context ==

The conceptual lineage of cloud computation stretches back to the 1960s, marked by the maturation of time-sharing concepts popularized through Remote Job Entry (RJE). The prevailing operational paradigm during this epoch involved the "data center" construct, where users submitted workloads to dedicated operators for execution on mainframe systems. This era was defined by intense investigation into mechanisms to democratize access to immense computational power via time-slicing, striving for optimized utilization across infrastructure, platform layers, and application execution, thereby maximizing end-user efficiency. The symbolic representation of virtualized services as a "cloud" dates to 1994, employed by General Magic to depict the expansive cosmos of "destinations" accessible by mobile agents within their Telescript environment. This visualization is attributed to David Hoffman, a specialist in communications at General Magic, borrowing from established conventions in telecommunications and network schematic drawing. The phrase "cloud computing" gained significant traction in 1996 following the circulation of a strategic business projection by Compaq Computer Corporation concerning future computational models and the Internet. The organization's aspiration was to superch

mcp-ai-vision-analyzer

Author

Nazruden

Quick Info

Actions

Tags