vision-text-extractor-mcp
Leverage the macOS Vision framework to perform sophisticated Optical Character Recognition (OCR) on supplied image data. This utility extracts textual content, associated confidence ratings, and precise bounding coordinates for each recognized segment, ideal for workflows needing robust image-to-text conversion.
Author

whiteking64
Quick Info
Actions
Tags
macOS Vision Framework Text Extraction MCP Service
This repository furnishes a MetaCall Protocol (MCP) module designed to harness the native text recognition capabilities embedded within the macOS operating system via the Vision framework. It exposes a singular endpoint, ocr_image, which accepts an image file path and yields the deciphered textual data alongside metrics on recognition certainty and spatial location.
Deployment Prerequisites
Required Components
Operation mandates Python version 3.13 or newer, alongside these core libraries:
- ocrmac: The foundational library interfacing with macOS OCR features. Reference ocrmac for details.
- Pillow: Essential for image handling and preprocessing tasks.
- mcp[cli]>=1.7.1: Necessary components for running the MetaCall Protocol intermediary.
Installation Procedure
Virtual environment usage is strongly advised for dependency management.
-
Environment Setup and Activation: bash python -m venv .venv source .venv/bin/activate
-
Dependency Resolution: bash uv sync
Initiating the MCP Endpoint
To bring the MCP server online, execute the primary script: bash uv run main.py
This action launches the server, immediately registering the ocr_image utility for remote invocation.
Accessible MCP Utilities
ocr_image
- Function: Executes comprehensive OCR on the designated image file by leveraging platform-native macOS functions. Returns structured data comprising recognized text blocks, corresponding confidence figures, and normalized bounding box coordinates.
- Input Signature:
file_path: str- Path to the target raster file (can be relative or absolute). -
Output Schema (Successful Operation):
{ "filename": "path/to/your/image.png", "annotations": [ { "text": "Extracted Text", "confidence": 0.98, "bounding_box": [0.05, 0.2, 0.4, 0.1] }, // ... subsequent findings ] }
-
Output Schema (Failure Indicators):
{ "error": "OCR functionality is constrained exclusively to macOS environments." }
or
{ "error": "Resource not located: path/to/nonexistent/file.jpg" }
Crucial Caveat: This service requires an execution environment running Apple's macOS due to its absolute dependency on the Vision framework.
Validation Via MCP Inspector
Testing connectivity and functionality can be readily accomplished using the MCP Inspector against the actively running MCP intermediary.
Integration Configuration for Cursor IDE
To seamlessly incorporate this OCR service into Cursor, inject the following configuration structure into your MCP configuration manifest (e.g., ~/.cursor/mcp.json or a project-level configuration file):
{ "mcpServers": { "ocrmac": { "command": "uv", "args": [ "--directory", "/path/to/macos-ocr-mcp", "run", "main.py" ] } } }
This setup instructs Cursor on the proper method to initiate and manage the backend server process, enabling direct invocation of the ocrmac.ocr_image utility from within the IDE interface.
