logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

vision-text-extractor-mcp

Leverage the macOS Vision framework to perform sophisticated Optical Character Recognition (OCR) on supplied image data. This utility extracts textual content, associated confidence ratings, and precise bounding coordinates for each recognized segment, ideal for workflows needing robust image-to-text conversion.

Author

vision-text-extractor-mcp logo

whiteking64

No License

Quick Info

GitHub GitHub Stars 1
NPM Weekly Downloads 0
Tools 1
Last Updated 2026-02-19

Tags

ocrmacoswhiteking64macos ocrocr mcpocr images

macOS Vision Framework Text Extraction MCP Service

This repository furnishes a MetaCall Protocol (MCP) module designed to harness the native text recognition capabilities embedded within the macOS operating system via the Vision framework. It exposes a singular endpoint, ocr_image, which accepts an image file path and yields the deciphered textual data alongside metrics on recognition certainty and spatial location.

Deployment Prerequisites

Required Components

Operation mandates Python version 3.13 or newer, alongside these core libraries: - ocrmac: The foundational library interfacing with macOS OCR features. Reference ocrmac for details. - Pillow: Essential for image handling and preprocessing tasks. - mcp[cli]>=1.7.1: Necessary components for running the MetaCall Protocol intermediary.

Installation Procedure

Virtual environment usage is strongly advised for dependency management.

  1. Environment Setup and Activation: bash python -m venv .venv source .venv/bin/activate

  2. Dependency Resolution: bash uv sync

Initiating the MCP Endpoint

To bring the MCP server online, execute the primary script: bash uv run main.py

This action launches the server, immediately registering the ocr_image utility for remote invocation.

Accessible MCP Utilities

ocr_image

  • Function: Executes comprehensive OCR on the designated image file by leveraging platform-native macOS functions. Returns structured data comprising recognized text blocks, corresponding confidence figures, and normalized bounding box coordinates.
  • Input Signature: file_path: str - Path to the target raster file (can be relative or absolute).
  • Output Schema (Successful Operation):

    { "filename": "path/to/your/image.png", "annotations": [ { "text": "Extracted Text", "confidence": 0.98, "bounding_box": [0.05, 0.2, 0.4, 0.1] }, // ... subsequent findings ] }

  • Output Schema (Failure Indicators):

    { "error": "OCR functionality is constrained exclusively to macOS environments." }

    or

    { "error": "Resource not located: path/to/nonexistent/file.jpg" }

Crucial Caveat: This service requires an execution environment running Apple's macOS due to its absolute dependency on the Vision framework.

Validation Via MCP Inspector

Testing connectivity and functionality can be readily accomplished using the MCP Inspector against the actively running MCP intermediary.

Integration Configuration for Cursor IDE

To seamlessly incorporate this OCR service into Cursor, inject the following configuration structure into your MCP configuration manifest (e.g., ~/.cursor/mcp.json or a project-level configuration file):

{ "mcpServers": { "ocrmac": { "command": "uv", "args": [ "--directory", "/path/to/macos-ocr-mcp", "run", "main.py" ] } } }

This setup instructs Cursor on the proper method to initiate and manage the backend server process, enabling direct invocation of the ocrmac.ocr_image utility from within the IDE interface.

See Also

`