mcp-remote-macos-control-agent
Facilitates total command over distant macOS workstations, featuring native system integration without any prerequisite auxiliary software. Highly tuned for deployment by autonomous artificial intelligence entities for desktop operations.
Author

baryhuang
Quick Info
Actions
Tags
MCP Backend Server - Native MacOS Operation Bridge
Introducing the inaugural open-source MCP intermediary designed to grant AI comprehensive dominion over remote Apple Macintosh operating environments.
Positioned as a specialized substitute for OpenAI's Operator functionality, this implementation is meticulously engineered for self-governing AI operatives, boasting full access to the desktop environment and demanding zero supplemental software installation.
Demonstrations of Capability
- Research and Broadcast on the X Platform (https://www.youtube.com/watch?v=--QHz2jcvcs)
-
Video Snippet Production utilizing CapCut (https://www.youtube.com/watch?v=RKAqiNoU8ec)
-
AI Candidate Sourcing: Automated collection of applicant data, vetting submissions, and scheduling initial screening interviews via the native Mail utility.
- AI Marketing Support (LinkedIn): Programmed engagement protocols including automated following, endorsing content, and contextual commenting with relevant network contacts.
- AI Marketing Support (Twitter): Automated interaction module for following, liking posts, and replying thoughtfully to target accounts.
Development Roadmap (Prioritized Actions)
- Execution Speed Enhancement - Achieve parity with established Linux desktop remote control solutions.
- Script Generation Refinement - Minimize latency associated with AppleScript execution while preserving functional versatility.
- VNC Pointer Visibility Upgrade - Improve visual feedback for debugging processes and live demonstrations.
Your contributions to this project are highly encouraged!
Key Attributes
- Zero Overage Charges: Leverage your existing Claude Pro subscription for cost-free screen analysis.
- Minimal Prerequisites: Simply activate Screen Sharing on the target Mac—no additional proprietary tools are mandatory.
- Comprehensive Version Support: Operates flawlessly across all current and future iterations of the macOS environment.
Rationale for Creation
Uncompromised Native MacOS Ecosystem Access
Apple's native operational environment remains the zenith of user experience design and is projected to retain this status for the foreseeable future. This is the domain where peak human efficacy is realized, and our objective is to allow artificial intelligence to function within this setting with equal fluidity.
Intrinsic Open Framework Design
- Interoperable LLM Connectivity: Compatible with any chosen MCP Client architecture.
- Model Agnostic: Seamless interfacing capabilities with models from OpenAI, Anthropic, or any third-party LLM vendor.
- Longevity Assured: Architected for continuous adaptation within the expanding MCP framework.
Deployment Simplicity
- No Target Machine Agents Required: Absolutely no resident background processes or applications need installation on the macOS hardware.
- Screen Sharing Suffices: Control any Mac so long as its Screen Sharing service is active.
- Elimination of Complex Infrastructure: Bypasses the need for auxiliary Python scripts or persistent backend services prevalent in competing solutions.
Streamlined Initialization Sequence
- Benefit from Claude Desktop's Refined Interface: Avoid reliance on command-line driven interfaces typically associated with Python deployments.
- Intuitive Interaction Paradigm: Engage with the AI-controlled Macintosh through an established, easy-to-grasp visual interface.
- Immediate Operational Capability: Begin productive work instantly without time lost to complex initial setup procedures.
System Schematic
Setup Procedure
- Activate Screen Sharing on MacOS (Skip this step if utilizing a Mac leased via macstadium.com)
- Establish Connection to your Remote MacOs
- Install Docker Desktop on your local machine
- Integrate this MCP Server into Claude Desktop Configure Claude Desktop to utilize the Docker container image by incorporating the following snippet into your Claude configuration file:
{ "mcpServers": { "remote-macos-use": { "command": "docker", "args": [ "run", "-i", "-e", "MACOS_USERNAME=your_macos_username", "-e", "MACOS_PASSWORD=your_macos_password", "-e", "MACOS_HOST=your_macos_hostname_or_ip", "--rm", "buryhuang/mcp-remote-macos-use:latest" ] } } }
Real-Time Screen Streaming via LiveKit (WebRTC Support)
This server component now features integrated WebRTC capabilities powered by LiveKit, enabling: - Ultra-low latency, instantaneous screen visualization. - Significant gains in overall responsiveness and throughput. - Enhanced network data handling compared to legacy VNC protocols. - Dynamic adjustment of visual quality based on prevailing network metrics.
To activate the WebRTC functionalities, you must: 1. Provision a LiveKit instance (either self-hosted or via LiveKit Cloud). 2. Supply the necessary LiveKit environment variables as outlined in the configuration template above.
Developer Guidance
Repository Cloning
bash
Obtain the repository source code
git clone https://github.com/yourusername/mcp-remote-macos-use.git cd mcp-remote-macos-use
Constructing the Docker Image
bash
Build the specific Docker image
docker build -t mcp-remote-macos-use .
Multi-Architecture Image Distribution
To publish the Docker container image supporting multiple CPU architectures, employ the docker buildx utility. Execute the subsequent stages:
-
Initialize a new builder instance (if one does not already exist): bash docker buildx create --use
-
Build the image for specified platforms and push to registry: bash docker buildx build --platform linux/amd64,linux/arm64 -t buryhuang/mcp-remote-macos-use:latest --push .
-
Confirm image availability across target architectures: bash docker buildx imagetools inspect buryhuang/mcp-remote-macos-use:latest
Utilization Overview
The server exposes Mac remote control functionalities exclusively through the defined MCP toolset.
Tool Set Definitions
The server exposes the subsequent utilities for remote macOS manipulation:
remote_macos_get_screen
Initiates a connection to the remote Macintosh system and retrieves a raster image (screenshot) of its current display. Connection parameters are inferred from configured environment variables.
remote_macos_send_keys
Transmits specified keyboard input sequences to the target macOS environment. Relies on setup environment variables for connectivity context.
remote_macos_mouse_move
Directs the pointer position to designated screen coordinates on the remote system, incorporating automated scaling relative to screen resolution. Connection data sourced from environment variables.
remote_macos_mouse_click
Executes a singular mouse press/release action at the specified coordinates on the remote desktop, with coordinate translation applied. Connection details are environment-driven.
remote_macos_mouse_double_click
Performs a rapid dual-click action at specified screen coordinates remotely, utilizing coordinate auto-scaling. Connection context derived from environment setup.
remote_macos_mouse_scroll
Simulates a mouse wheel scroll event at the target coordinates on the remote macOS interface, applying coordinate normalization. Utilizes pre-configured environment variables.
remote_macos_open_application
Launches or brings to the foreground a specified application, returning its Process Identifier (PID) for subsequent operational control.
remote_macos_mouse_drag_n_drop
Simulates the physical action of dragging an object from a starting coordinate to a destination coordinate on the remote display, including coordinate scaling.
All provided utilities fetch their connection context via the environment variables established during the server's launch, thus obviating the necessity to pass explicit connection arguments to the individual tools.
Constraints
- Authentication Mechanism Constraint:
- Limited solely to Apple's proprietary Authentication (Protocol identifier 30).
Security Advisory
https://support.apple.com/guide/remote-desktop/encrypt-network-data-apdfe8e386b/mac https://cafbit.com/post/apple_remote_desktop_quirks/
We strictly support Protocol 30, which employs the Diffie-Hellman key exchange mechanism featuring a 512-bit prime modulus. This specific protocol version is what macOS versions 11 through 12 employ when communicating with hosts running OS X 10.11 or earlier remote control software.
Below is the protocol data presented in a tabular format:
| Target macOS Version | Client MacOS Version | Authentication Method(s) | Control & Observation Encryption | Data Transfer/Installation Encryption | All Other Operations Encryption | Protocol Revision |
|---|---|---|---|---|---|---|
| macOS 13 | macOS 13 | 2048-bit RSA Host Keys | 2048-bit RSA Host Keys | 2048-bit RSA host authentication, then 128-bit AES | 2048-bit RSA Host Keys | 36 |
| macOS 13 | macOS 10.12 | Secure Remote Password (SRP) for local; Diffie-Hellman (DH) if integrated with LDAP or targeting OS X 10.11 or prior | SRP or DH, reinforced with 128-bit AES | SRP or DH auth, followed by 128-bit AES | 2048-bit RSA Host Keys | 35 |
| macOS 11 to macOS 12 | macOS 10.12 to macOS 13 | SRP locally; DH if bound to LDAP | SRP or DH 1024-bit, 128-bit AES | 2048-bit RSA Host Keys (macOS 13 down to 10.13) | 2048-bit RSA Host Keys (macOS 10.13+) | 33 |
| macOS 11 to macOS 12 | OS X 10.11 or older | DH 1024-bit | DH 1024-bit, 128-bit AES | Diffie-Hellman Key agreement protocol utilizing a 512-bit prime | Diffie-Hellman Key agreement protocol utilizing a 512-bit prime | 30 |
Ensure that only connections to verified, authorized remote Mac systems are ever established using this utility. Secure, authenticated links are mandatory for accessing remote Mac infrastructure.
Licensing Information
Refer to the LICENSE file for comprehensive terms.
WIKIPEDIA INSIGHT: A headless browser is defined as a web browsing application devoid of a graphical user interface layer. These automated agents facilitate programmatic manipulation of web content in an environment mirroring standard browsers, managed through command lines or network calls. They are invaluable for web asset validation because they replicate browser rendering capabilities—including stylesheets (layout, color, typography) and JavaScript/Ajax execution—features often absent in alternative validation methodologies. Since updates to Google Chrome (version 59) and Firefox (version 56), native remote control APIs have superseded previous solutions like PhantomJS.
== Primary Applications == The core utility cases for utilizing non-visual browsing agents encompass:
Automated validation routines for contemporary web applications (web testing). Generating static images or snapshots of rendered web pages. Executing automated checks for JavaScript libraries. Programmatic orchestration of user interactions with web documents.
=== Secondary Uses === These agents are also effective tools for sophisticated web data aggregation (scraping). Google indicated in 2009 that such agents aid in indexing content reliant on Ajax loading. Conversely, headless agents have been exploited for illicit activities, such as organizing Distributed Denial of Service (DDoS) assaults, artificially inflating advertisement view counts, or automating site interactions improperly (e.g., for unauthorized credential testing). However, a 2018 traffic analysis suggested that malicious actors do not show a marked preference for headless tools over traditional browsers for activities like DDoS, SQL injection, or Cross-Site Scripting.
== Execution Methods == Given that major browser vendors now natively incorporate headless modes via dedicated APIs, several software frameworks offer unified interaction layers for browser automation. Examples include:
Selenium WebDriver – Conforms to W3C standards for WebDriver implementation. Playwright – A Node.js utility for automating Chromium, Firefox, and WebKit environments. Puppeteer – A Node.js library focused on automating Chrome and Firefox instances.
=== Testing Automation Integration === Certain testing suites integrate headless browsers directly into their validation apparatus. Examples include:
Capybara, which leverages WebKit or Headless Chrome to mimic user actions. Jasmine, which defaults to Selenium but supports Headless Chrome or WebKit for test execution. Cypress, a framework designed for frontend validation. QF-Test, a tool for graphical interface program testing that supports headless browser usage.
=== Alternative Approaches === An alternative strategy involves employing software that exposes browser-like APIs directly. Deno, for instance, incorporates such APIs into its core design. For Node.js environments, jsdom offers the most extensive implementation. While these alternatives generally support core browser functionalities (HTML parsing, cookies, XHR, limited JavaScript), they typically lack full DOM rendering and have restricted support for DOM events, often achieving faster processing speeds than full browser emulation.
