desktop-visual-analyzer-mcp
Leverage the Claude Vision API to capture, interpret, and gain actionable intelligence from on-screen visual data. Facilitates deep workflow enhancement via interoperability with AI coding environments and other MCP-compliant agents.
Author

hemenge133
Quick Info
Actions
Tags
🖥️ desktop-visual-analyzer-mcp
An advanced Model Context Protocol (MCP) utility designed to equip intelligent agents with the capacity to grab and semantically process desktop imagery via the Claude Vision inference engine. Obtain visual snapshots, conduct deep environmental interpretation, and receive AI-driven diagnostics on your graphical user interface.
✨ Core Capabilities
- 📸 Immediate acquisition of the entire display buffer
- 🧠 Computer Vision analysis powered by Claude's multimodal models
- 🤖 Smooth integration pathway for other MCP-conforming digital assistants
- ⚙️ Simplified provisioning and deployment procedures
- 📡 Native support across both standard I/O (stdio) and Server-Sent Events (SSE) communication channels
🎯 Primary Applications
- Visual auditing and interpretation of the active desktop state
- Deconstruction and assessment of User Interface layouts and components
- Visual debugging through the application of captured screen state
- Extracting semantic meaning and context from screen imagery
- Automated documentation of graphical elements and spatial arrangements
- Visual feedback loops for robotic desktop process execution
🚀 Installation Guide
Preferred Method: npm Distribution
The most robust path for integration is via the Node Package Manager:
bash
Install the globally available utility
npm install -g desktop-visual-analyzer-mcp
For strict reproducibility, pin the exact release version
npm install -g desktop-visual-analyzer-mcp@2.0.15 # Substitute with current version
Subsequent to installation, configure your primary AI client as detailed in the "Configuration Protocol" section.
Configuration Protocol
Once installed via npm, the tool must be registered within your AI client's configuration manifest:
For stdio Transport (Default Mode)
Claude Desktop Client Paths:
- Windows: %APPDATA%/Claude/claude_desktop_config.json
- MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Cursor Client Paths:
- Windows: %APPDATA%/Cursor/mcp.json or ~/.cursor/mcp.json
- MacOS: ~/Library/Application Support/Cursor/mcp.json
CLIne Path:
- ~/.config/cline/mcp.json
Windsurf Path:
- ~/.config/windsurf/mcp.json
Configuration JSON structure for stdio:
{ "mcpServers": { "desktop-visual-analyzer-mcp": { "command": "npx", "args": [ "desktop-visual-analyzer-mcp@2.0.15" // Pinning ensures execution stability ], "transport": "stdio", "env": { "ANTHROPIC_API_KEY": "your-anthropic-api-key-here" } } } }
For SSE Transport
Applicable for clients featuring SSE compatibility or when establishing a remote communication pipeline:
{ "mcpServers": { "desktop-visual-analyzer-mcp": { "command": "npx", "args": [ "desktop-visual-analyzer-mcp@2.0.15", "--sse", "--port", "8080", "--host", "localhost" ], "env": { "ANTHROPIC_API_KEY": "your-anthropic-api-key-here" } } } }
To connect to a server instance hosted externally:
{ "mcpServers": { "desktop-visual-analyzer-mcp": { "url": "http://your-server-ip:8080/sse", "transport": "sse", "env": { "ANTHROPIC_API_KEY": "your-anthropic-api-key-here" } } } }
🛠️ Exposed Functionality
analyzeScreenContent
Initiates a capture of the current visual state and feeds it into the Vision model for subsequent interpretation.
Parameters (Schema): typescript { prompt?: string; // Customized instructional query for the AI modelName?: string; // Specification of the Claude model instance to invoke saveScreenshot?: boolean; // Local persistence flag for the captured image file }
Demonstration invocation within an AI dialogue context:
Analyze the entirety of my active display and generate a summary focusing on the primary workflow components.
🛑 Diagnostic Procedures
Common Faults
- Permission Denied for Capture: Verify that the invoking application possesses requisite operating system permissions for screen recording.
- API Credential Failure: Confirm the validity and correct environment variable placement of your Anthropic access token.
- Tool Resolution Error: Check global installation status (
npm list -g desktop-visual-analyzer-mcp). - Version Skew: Explicitly define the package version in the configuration file to circumvent unexpected dependency caching behaviors.
- Communication Mismatch: Ensure the selected transport mechanism aligns with the host client's capabilities.
- Claude Desktop mandates stdio.
- SSE is available for compatible external consumers.
Transport Compatibility Matrix
| Client Application | Supported Protocols |
|---|---|
| Claude Desktop | stdio |
| Cursor | stdio, SSE |
| Cline | stdio, SSE |
| Windsurf | stdio, SSE |
Connection failures usually point to a configuration mismatch in the transport type.
🧑💻 Development Workflow
-
Source Acquisition & Setup: bash git clone https://github.com/yourusername/screen-view-mcp.git cd screen-view-mcp npm install
-
Compilation: bash npm run build
-
Local Execution Testing: bash
Testing via serial communication (stdio)
node dist/screen-capture-mcp.js --api-key=your-anthropic-api-key
Testing via network streaming (SSE)
node dist/screen-capture-mcp.js --sse --port 8080 --host localhost --api-key=your-anthropic-api-key
📜 Licensing
MIT
🚀 Smithery Deployment Protocol
This package is optimized for deployment onto the Smithery platform, enabling secure hosting of the MCP server over a WebSocket connection.
Deployment Prerequisites
- Repository must contain a functional
Dockerfile - Repository must contain a configuration file named
smithery.yaml - The Anthropic API Key must be supplied during the configuration phase
Deployment Sequence
- Integrate the server definition into the Smithery registry.
- Navigate to the Deployment Management interface.
- Input your required Anthropic API credentials.
- Initiate the deployment process.
Configuration Variables
anthropicApiKey(Mandatory): Your unique Anthropic access credential.verbose(Optional): Activates extended logging output (Default: false).
Available Functions
helloWorld: A basic diagnostic function returning an echoed message.analyzeScreenContent: Captures a display image and invokes Claude Vision analysis.
💡 Operational Examples
Executing Visual Analysis
javascript const analysisResult = await mcpClient.invoke("analyzeScreenContent", { prompt: "Examine my screen. What critical information is presented, and how is the application focused?", modelName: "claude-3-opus-20240229" }); console.log(analysisResult);
