MCP Backend Server - Native MacOS Operation Bridge

Introducing the inaugural open-source MCP intermediary designed to grant AI comprehensive dominion over remote Apple Macintosh operating environments.

Positioned as a specialized substitute for OpenAI's Operator functionality, this implementation is meticulously engineered for self-governing AI operatives, boasting full access to the desktop environment and demanding zero supplemental software installation.

Demonstrations of Capability - Research and Broadcast on the X Platform (https://www.youtube.com/watch?v=--QHz2jcvcs)

Video Snippet Production utilizing CapCut (https://www.youtube.com/watch?v=RKAqiNoU8ec)
AI Candidate Sourcing: Automated collection of applicant data, vetting submissions, and scheduling initial screening interviews via the native Mail utility.
AI Marketing Support (LinkedIn): Programmed engagement protocols including automated following, endorsing content, and contextual commenting with relevant network contacts.
AI Marketing Support (Twitter): Automated interaction module for following, liking posts, and replying thoughtfully to target accounts.

Development Roadmap (Prioritized Actions)

Execution Speed Enhancement - Achieve parity with established Linux desktop remote control solutions.
Script Generation Refinement - Minimize latency associated with AppleScript execution while preserving functional versatility.
VNC Pointer Visibility Upgrade - Improve visual feedback for debugging processes and live demonstrations.

Your contributions to this project are highly encouraged!

Key Attributes

Zero Overage Charges: Leverage your existing Claude Pro subscription for cost-free screen analysis.
Minimal Prerequisites: Simply activate Screen Sharing on the target Mac—no additional proprietary tools are mandatory.
Comprehensive Version Support: Operates flawlessly across all current and future iterations of the macOS environment.

Rationale for Creation

Uncompromised Native MacOS Ecosystem Access

Apple's native operational environment remains the zenith of user experience design and is projected to retain this status for the foreseeable future. This is the domain where peak human efficacy is realized, and our objective is to allow artificial intelligence to function within this setting with equal fluidity.

Intrinsic Open Framework Design

Interoperable LLM Connectivity: Compatible with any chosen MCP Client architecture.
Model Agnostic: Seamless interfacing capabilities with models from OpenAI, Anthropic, or any third-party LLM vendor.
Longevity Assured: Architected for continuous adaptation within the expanding MCP framework.

Deployment Simplicity

No Target Machine Agents Required: Absolutely no resident background processes or applications need installation on the macOS hardware.
Screen Sharing Suffices: Control any Mac so long as its Screen Sharing service is active.
Elimination of Complex Infrastructure: Bypasses the need for auxiliary Python scripts or persistent backend services prevalent in competing solutions.

Streamlined Initialization Sequence

Benefit from Claude Desktop's Refined Interface: Avoid reliance on command-line driven interfaces typically associated with Python deployments.
Intuitive Interaction Paradigm: Engage with the AI-controlled Macintosh through an established, easy-to-grasp visual interface.
Immediate Operational Capability: Begin productive work instantly without time lost to complex initial setup procedures.

System Schematic

remote_macos_use_system_architecture

Setup Procedure

Activate Screen Sharing on MacOS (Skip this step if utilizing a Mac leased via macstadium.com)
Establish Connection to your Remote MacOs
Install Docker Desktop on your local machine
Integrate this MCP Server into Claude Desktop Configure Claude Desktop to utilize the Docker container image by incorporating the following snippet into your Claude configuration file:

{ "mcpServers": { "remote-macos-use": { "command": "docker", "args": [ "run", "-i", "-e", "MACOS_USERNAME=your_macos_username", "-e", "MACOS_PASSWORD=your_macos_password", "-e", "MACOS_HOST=your_macos_hostname_or_ip", "--rm", "buryhuang/mcp-remote-macos-use:latest" ] } } }

Real-Time Screen Streaming via LiveKit (WebRTC Support)

This server component now features integrated WebRTC capabilities powered by LiveKit, enabling: - Ultra-low latency, instantaneous screen visualization. - Significant gains in overall responsiveness and throughput. - Enhanced network data handling compared to legacy VNC protocols. - Dynamic adjustment of visual quality based on prevailing network metrics.

To activate the WebRTC functionalities, you must: 1. Provision a LiveKit instance (either self-hosted or via LiveKit Cloud). 2. Supply the necessary LiveKit environment variables as outlined in the configuration template above.

Developer Guidance

Repository Cloning

bash

Obtain the repository source code

git clone https://github.com/yourusername/mcp-remote-macos-use.git cd mcp-remote-macos-use

Constructing the Docker Image

bash

Build the specific Docker image

docker build -t mcp-remote-macos-use .

Multi-Architecture Image Distribution

To publish the Docker container image supporting multiple CPU architectures, employ the docker buildx utility. Execute the subsequent stages:

Initialize a new builder instance (if one does not already exist): bash docker buildx create --use
Build the image for specified platforms and push to registry: bash docker buildx build --platform linux/amd64,linux/arm64 -t buryhuang/mcp-remote-macos-use:latest --push .
Confirm image availability across target architectures: bash docker buildx imagetools inspect buryhuang/mcp-remote-macos-use:latest

Utilization Overview

The server exposes Mac remote control functionalities exclusively through the defined MCP toolset.

Tool Set Definitions

The server exposes the subsequent utilities for remote macOS manipulation:

remote_macos_get_screen

Initiates a connection to the remote Macintosh system and retrieves a raster image (screenshot) of its current display. Connection parameters are inferred from configured environment variables.

remote_macos_send_keys

Transmits specified keyboard input sequences to the target macOS environment. Relies on setup environment variables for connectivity context.

remote_macos_mouse_move

Directs the pointer position to designated screen coordinates on the remote system, incorporating automated scaling relative to screen resolution. Connection data sourced from environment variables.

remote_macos_mouse_click

Executes a singular mouse press/release action at the specified coordinates on the remote desktop, with coordinate translation applied. Connection details are environment-driven.

remote_macos_mouse_double_click

Performs a rapid dual-click action at specified screen coordinates remotely, utilizing coordinate auto-scaling. Connection context derived from environment setup.

remote_macos_mouse_scroll

Simulates a mouse wheel scroll event at the target coordinates on the remote macOS interface, applying coordinate normalization. Utilizes pre-configured environment variables.

remote_macos_open_application

Launches or brings to the foreground a specified application, returning its Process Identifier (PID) for subsequent operational control.

remote_macos_mouse_drag_n_drop

Simulates the physical action of dragging an object from a starting coordinate to a destination coordinate on the remote display, including coordinate scaling.

All provided utilities fetch their connection context via the environment variables established during the server's launch, thus obviating the necessity to pass explicit connection arguments to the individual tools.

Constraints

Authentication Mechanism Constraint:
Limited solely to Apple's proprietary Authentication (Protocol identifier 30).

Security Advisory

https://support.apple.com/guide/remote-desktop/encrypt-network-data-apdfe8e386b/mac https://cafbit.com/post/apple_remote_desktop_quirks/

We strictly support Protocol 30, which employs the Diffie-Hellman key exchange mechanism featuring a 512-bit prime modulus. This specific protocol version is what macOS versions 11 through 12 employ when communicating with hosts running OS X 10.11 or earlier remote control software.

Below is the protocol data presented in a tabular format:

Target macOS Version	Client MacOS Version	Authentication Method(s)	Control & Observation Encryption	Data Transfer/Installation Encryption	All Other Operations Encryption	Protocol Revision
macOS 13	macOS 13	2048-bit RSA Host Keys	2048-bit RSA Host Keys	2048-bit RSA host authentication, then 128-bit AES	2048-bit RSA Host Keys	36
macOS 13	macOS 10.12	Secure Remote Password (SRP) for local; Diffie-Hellman (DH) if integrated with LDAP or targeting OS X 10.11 or prior	SRP or DH, reinforced with 128-bit AES	SRP or DH auth, followed by 128-bit AES	2048-bit RSA Host Keys	35
macOS 11 to macOS 12	macOS 10.12 to macOS 13	SRP locally; DH if bound to LDAP	SRP or DH 1024-bit, 128-bit AES	2048-bit RSA Host Keys (macOS 13 down to 10.13)	2048-bit RSA Host Keys (macOS 10.13+)	33
macOS 11 to macOS 12	OS X 10.11 or older	DH 1024-bit	DH 1024-bit, 128-bit AES	Diffie-Hellman Key agreement protocol utilizing a 512-bit prime	Diffie-Hellman Key agreement protocol utilizing a 512-bit prime	30

Ensure that only connections to verified, authorized remote Mac systems are ever established using this utility. Secure, authenticated links are mandatory for accessing remote Mac infrastructure.

Licensing Information

Refer to the LICENSE file for comprehensive terms.

WIKIPEDIA INSIGHT: A headless browser is defined as a web browsing application devoid of a graphical user interface layer. These automated agents facilitate programmatic manipulation of web content in an environment mirroring standard browsers, managed through command lines or network calls. They are invaluable for web asset validation because they replicate browser rendering capabilities—including stylesheets (layout, color, typography) and JavaScript/Ajax execution—features often absent in alternative validation methodologies. Since updates to Google Chrome (version 59) and Firefox (version 56), native remote control APIs have superseded previous solutions like PhantomJS.

== Primary Applications == The core utility cases for utilizing non-visual browsing agents encompass:

Automated validation routines for contemporary web applications (web testing). Generating static images or snapshots of rendered web pages. Executing automated checks for JavaScript libraries. Programmatic orchestration of user interactions with web documents.

=== Secondary Uses === These agents are also effective tools for sophisticated web data aggregation (scraping). Google indicated in 2009 that such agents aid in indexing content reliant on Ajax loading. Conversely, headless agents have been exploited for illicit activities, such as organizing Distributed Denial of Service (DDoS) assaults, artificially inflating advertisement view counts, or automating site interactions improperly (e.g., for unauthorized credential testing). However, a 2018 traffic analysis suggested that malicious actors do not show a marked preference for headless tools over traditional browsers for activities like DDoS, SQL injection, or Cross-Site Scripting.

== Execution Methods == Given that major browser vendors now natively incorporate headless modes via dedicated APIs, several software frameworks offer unified interaction layers for browser automation. Examples include:

Selenium WebDriver – Conforms to W3C standards for WebDriver implementation. Playwright – A Node.js utility for automating Chromium, Firefox, and WebKit environments. Puppeteer – A Node.js library focused on automating Chrome and Firefox instances.

=== Testing Automation Integration === Certain testing suites integrate headless browsers directly into their validation apparatus. Examples include:

Capybara, which leverages WebKit or Headless Chrome to mimic user actions. Jasmine, which defaults to Selenium but supports Headless Chrome or WebKit for test execution. Cypress, a framework designed for frontend validation. QF-Test, a tool for graphical interface program testing that supports headless browser usage.

=== Alternative Approaches === An alternative strategy involves employing software that exposes browser-like APIs directly. Deno, for instance, incorporates such APIs into its core design. For Node.js environments, jsdom offers the most extensive implementation. While these alternatives generally support core browser functionalities (HTML parsing, cookies, XHR, limited JavaScript), they typically lack full DOM rendering and have restricted support for DOM events, often achieving faster processing speeds than full browser emulation.

mcp-remote-macos-control-agent

Author

baryhuang

Quick Info

Actions

Tags