DesktopAutomationBridge-Win
A specialized Windows control agent leveraging nut.js and the Model Context Protocol (MCP), engineered to provide high-fidelity programmatic access to system interactions such as cursor manipulation, keyboard input execution, active window administration, and comprehensive screen capture utilities.
Author

claude-did-this
Quick Info
Actions
Tags
DesktopAutomationBridge-Win (DAB-Win)
This package functions as a robust Windows runtime environment server implementing the Model Context Protocol. It exposes functional endpoints for abstracting and executing essential operating system tasks, including low-level input simulation (mouse/keyboard), management of running application windows, and acquisition of visual screen data.
Critical Constraint: This solution is engineered exclusively for the Microsoft Windows operating system environment.
🚀 Rationale for DAB-Win
DAB-Win establishes a crucial linkage layer between advanced AI reasoning engines and the user's physical desktop, facilitating safe, scriptable control over:
- 🖱️ Pointer Device Emulation: Precise movement, button activation, and compound dragging actions.
- ⌨️ Typing & Input Sequences: Insertion of textual data and execution of complex hotkey combinations.
- 🪟 OS Window Handling: Querying, prioritizing, resizing, and positioning application surfaces.
- 📸 Visual Acquisition: Capturing screen regions or the entire viewport for subsequent AI analysis.
- 📋 System Clipboard Access: Reading and writing data to the system clipboard buffer.
⚙️ Deployment Guide
Prerequisites
Ensure the following development and system utilities are provisioned:
-
Visual Studio Build Tools (Must include C++ development workload):
powershell # Execute with elevated privileges - this step installs necessary compilers winget install Microsoft.VisualStudio.2022.BuildTools --override "--wait --passive --add Microsoft.VisualStudio.Workload.VCTools --includeRecommended" -
Python Interpreter (Required for native module compilation via node-gyp):
powershell # Installing a modern Python version winget install Python.Python.3.12 -
Node.js Runtime Environment:
powershell # Installation of the latest Long-Term Support (LTS) release winget install OpenJS.NodeJS
Installation Procedure
- Install the DAB-Win NPM Package Globally
powershell npm install -g mcp-control
Initial Configuration Notes
DAB-Win achieves optimal input fidelity when operating within a virtual machine set to a 1280x720 pixel resolution. For connection, configure your target AI client (e.g., Claude) to utilize the Server-Sent Events (SSE) transport mechanism pointed at this server.
Method A: Remote/VM Peer Connection (Direct SSE)
Used when the server runs on a distinct machine accessible over the network:
{
"mcpServers": {
"DesktopBridge": {
"transport": "sse",
"url": "http://192.168.1.100:3232/mcp"
}
}
}
Substitute the example IP/Port with your actual server endpoint.
Method B: Local Invocation with SSE
Used when the AI client launches the server directly:
{
"mcpServers": {
"DesktopBridge": {
"command": "mcp-control",
"args": ["--sse"]
}
}
}
Activating the Control Server
Execute the following command in your terminal on the machine hosting the server:
mcp-control --sse
The console output will enumerate the utilized network interfaces, confirm the bound port (default: 3232), and signal connection readiness.
VM Deployment Workflow Summary
- Initialize your Windows Virtual Machine instance, fixing the display resolution to 1280x720.
- Install DAB-Win on the guest OS via npm.
- Start the server daemon using SSE mode:
mcp-control --sse. - Retrieve the VM's internal IP address (e.g.,
192.168.1.100). - Update your client configuration file to point to this address and port.
- Restart the client application. DAB-Win functionality will now appear in the tool selection interface.
⚙️ Command Line Interface Flags
DAB-Win exposes various arguments for customization:
# Standard activation via SSE on port 3232
mcp-control --sse
# Custom port usage
mcp-control --sse --port 3000
# Secure TLS/HTTPS activation (Mandatory for production/remote scenarios)
mcp-control --sse --https --cert /path/to/cert.pem --key /path/to/key.pem
# Secure connection on a non-standard port
mcp-control --sse --https --port 8443 --cert /path/to/cert.pem --key /path/to/key.pem
Argument Reference
--sse: Enables Server-Sent Events transport for network communication.--port [value]: Specifies an alternative listening port (default: 3232).--https: Activates TLS/SSL encryption (required by MCP specification for remote access).--cert [path]: Location of the necessary TLS certificate file.--key [path]: Location of the TLS private key file.
Security Mandate
The MCP specification mandates the use of HTTPS for any remote access utilizing HTTP-based transports. Always utilize the --https flag, accompanied by valid certificate/key pairs, when deploying DAB-Win outside of a localhost context.
✨ Principal Use Cases
Guided Operational Flows
- User Interface Regression Testing: Offload monotonous interaction testing to the AI engine for automated bug identification.
- Task Delegation: Empower the AI to manage routine, high-volume operational tasks autonomously.
- Data Entry Automation: Utilize AI supervision for rapid and accurate population of digital forms.
AI Research & Experimentation
- Interactive Simulation: Observe and guide AI agents through dynamic graphical environments.
- Visual Problem Solving: Gauge the agent's capacity to interpret visual layouts and overcome screen-based obstacles.
- Synergistic Interaction Models: Pioneer new methods where the AI perceives the desktop context to assist in complex workflows.
Development & Quality Assurance
- Inter-Application Bridging: Facilitate communication between disparate software components lacking native integration.
- Visual Regression Suites: Construct rigorous test suites incorporating visual confirmation checkpoints.
- Demonstration Scripting: Automate the sequence generation required for software showcases.
🚨 CRITICAL WARNING AND LIABILITY WAIVER
THIS UTILITY IS CLASSIFIED AS EXPERIMENTAL AND CARRIES INHERENT OPERATIONAL RISK
By proceeding with the deployment or utilization of this software, you unequivocally acknowledge and consent to the following:
- Granting an autonomous AI entity direct execution control over host system peripherals (mouse/keyboard) constitutes a significant security exposure.
- The capability to manipulate system controls may result in unforeseen, detrimental system states or data corruption.
- Usage is performed strictly at the user's personal liability and discretion.
- The originators and maintainers of this codebase explicitly disclaim all liability for direct or consequential damages, data loss, or negative outcomes resulting from its use.
- Deployment must be confined to isolated, controlled testing sandboxes until full confidence in its operation is established.
PROCEED WITH EXTREME CAUTION
🌟 Core Capabilities Matrix
🪟 Window Management Toolkit
|
🖱️ Cursor & Input Emulation
|
⌨️ Key Input Simulation
|
📸 Visual System Capture
|
🎛️ Automation Backend Providers
DAB-Win supports pluggable backend mechanisms for executing native commands:
- keysender (Default): Utilizes a highly reliable, native Windows library for UI interaction.
- powershell: Leverages Windows PowerShell cmdlets for simpler task automation.
- autohotkey: Integration with AutoHotkey v2 scripting for highly complex, custom sequences.
Provider Configuration Methods
Global provider selection via environment variables:
# Designate AutoHotkey as the primary engine for all tasks
export AUTOMATION_PROVIDER=autohotkey
# Specify the exact executable path if AHK is not in system PATH
export AUTOHOTKEY_PATH="C:\Program Files\AutoHotkey\v2\AutoHotkey.exe"
Alternatively, module-specific assignment allows for mixed environments:
# Use AHK for keyboard, default for mouse, PowerShell for clipboard
export AUTOMATION_KEYBOARD_PROVIDER=autohotkey
export AUTOMATION_MOUSE_PROVIDER=keysender
export AUTOMATION_SCREEN_PROVIDER=keysender
export AUTOMATION_CLIPBOARD_PROVIDER=powershell
Refer to dedicated documentation for specific provider details: AutoHotkey Provider Docs
🏗️ Development Environment Initialization
For developers aiming to contribute source code or compile the package from scratch, consult [CONTRIBUTING.md] for detailed setup procedures.
Essential Development Dependencies
- Operating System: Windows (mandatory due to native module linkage).
- Runtime: Node.js version 18 or newer (installation via official Windows MSI is recommended, as it often bundles necessary build tools).
- Package Manager: npm.
- Native Compilation Utilities:
node-gyp: Install vianpm install -g node-gypcmake-js: Install vianpm install -g cmake-js
The native keysender components necessitate the presence of these compilers for successful linkage.
📂 Source Code Organization
/src/handlers: Modules managing incoming protocol requests and overall tool orchestration./tools: Concrete implementations of mouse, keyboard, and window functions./types: Centralized TypeScript interface definitions.index.ts: Primary bootstrap file for application startup.
🌿 Repository Branches Overview
main: Active development stream incorporating the newest functional increments.release: Branch pinned to stable, version-tagged releases (currently corresponding to v0.2.0).
Targeted Version Retrieval
Specific releases can be installed via npm:
# Fetches the latest committed stable version (mirrors release branch)
pm install mcp-control
# Install a specific historical release tag
pm install mcp-control@0.1.22
📚 Core Package Dependencies
- @modelcontextprotocol/sdk - Protocol definition and communication framework.
- keysender - Primary Windows UI interaction library.
- clipboardy - Cross-platform clipboard interfacing.
- sharp - High-performance image manipulation.
- uuid - Standardized unique identifier generation.
🛑 Recognized Operational Constraints
- Functions for minimizing or restoring windows are presently omitted from the API.
- Multi-monitor setups may exhibit unpredictable behavior with screen capture utilities.
- The
get_screenshotfunction has known incompatibility issues with the VS Code Extension Cline (Ref: GitHub issue #1865). - Certain high-security applications might block input unless the host process runs with elevated administrative rights.
- Exclusively supports Windows OS.
- Accuracy Note: Optimal performance, especially pointer accuracy, is calibrated for a 1280x720, single-monitor configuration. Ongoing work is underway to resolve scaling offsets; contributors with testing assistance are welcomed.
🤝 Collaborative Engagement
For contribution guidelines, please consult the dedicated document: [CONTRIBUTING.md]
⚖️ Licensing Information
This software is distributed under the permissive MIT License. Refer to the LICENSE file for full terms.
📖 Further Documentation
WIKIPEDIA CONTEXT NOTE: A headless browser operates without a conventional graphical interface, allowing automated manipulation via command line or network calls. They are invaluable for web testing because they interpret HTML/CSS/JavaScript identically to standard browsers, enabling accurate rendering checks. Modern browser releases (Chrome 59+, Firefox 56+) offer native remote control, diminishing reliance on older tools like PhantomJS.
Primary Applications: Web application validation, programmatic screenshot generation, automated JavaScript library testing, and interactive web page automation.
Secondary Uses: Web scraping (including Ajax-heavy content indexing). Malicious uses (DDoS, ad fraud) are not statistically favored over traditional browser methods according to recent traffic analysis.
Automation Tools: Software leveraging native headless support includes Selenium WebDriver, Playwright, and Puppeteer, offering unified APIs for controlling Chromium, Firefox, and WebKit engines. Alternatives like jsdom (for Node.js) provide DOM APIs but lack actual rendering capabilities.

