logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

DesktopAutomationBridge-Win

A specialized Windows control agent leveraging nut.js and the Model Context Protocol (MCP), engineered to provide high-fidelity programmatic access to system interactions such as cursor manipulation, keyboard input execution, active window administration, and comprehensive screen capture utilities.

Author

DesktopAutomationBridge-Win logo

claude-did-this

MIT License

Quick Info

GitHub GitHub Stars 239
NPM Weekly Downloads 0
Tools 1
Last Updated 2026-02-19

Tags

mcpcontrolautomationbrowserbrowser automationautomation webmcpcontrol windows

DesktopAutomationBridge-Win (DAB-Win)

Control System Interface Graphic

Current Stable Version

This package functions as a robust Windows runtime environment server implementing the Model Context Protocol. It exposes functional endpoints for abstracting and executing essential operating system tasks, including low-level input simulation (mouse/keyboard), management of running application windows, and acquisition of visual screen data.

Critical Constraint: This solution is engineered exclusively for the Microsoft Windows operating system environment.

🚀 Rationale for DAB-Win

DAB-Win establishes a crucial linkage layer between advanced AI reasoning engines and the user's physical desktop, facilitating safe, scriptable control over:

  • 🖱️ Pointer Device Emulation: Precise movement, button activation, and compound dragging actions.
  • ⌨️ Typing & Input Sequences: Insertion of textual data and execution of complex hotkey combinations.
  • 🪟 OS Window Handling: Querying, prioritizing, resizing, and positioning application surfaces.
  • 📸 Visual Acquisition: Capturing screen regions or the entire viewport for subsequent AI analysis.
  • 📋 System Clipboard Access: Reading and writing data to the system clipboard buffer.

⚙️ Deployment Guide

Prerequisites

Ensure the following development and system utilities are provisioned:

  1. Visual Studio Build Tools (Must include C++ development workload): powershell # Execute with elevated privileges - this step installs necessary compilers winget install Microsoft.VisualStudio.2022.BuildTools --override "--wait --passive --add Microsoft.VisualStudio.Workload.VCTools --includeRecommended"

  2. Python Interpreter (Required for native module compilation via node-gyp): powershell # Installing a modern Python version winget install Python.Python.3.12

  3. Node.js Runtime Environment: powershell # Installation of the latest Long-Term Support (LTS) release winget install OpenJS.NodeJS

Installation Procedure

  1. Install the DAB-Win NPM Package Globally powershell npm install -g mcp-control

Initial Configuration Notes

DAB-Win achieves optimal input fidelity when operating within a virtual machine set to a 1280x720 pixel resolution. For connection, configure your target AI client (e.g., Claude) to utilize the Server-Sent Events (SSE) transport mechanism pointed at this server.

Method A: Remote/VM Peer Connection (Direct SSE)

Used when the server runs on a distinct machine accessible over the network:

{
  "mcpServers": {
    "DesktopBridge": {
      "transport": "sse",
      "url": "http://192.168.1.100:3232/mcp"
    }
  }
}

Substitute the example IP/Port with your actual server endpoint.

Method B: Local Invocation with SSE

Used when the AI client launches the server directly:

{
  "mcpServers": {
    "DesktopBridge": {
      "command": "mcp-control",
      "args": ["--sse"]
    }
  }
}

Activating the Control Server

Execute the following command in your terminal on the machine hosting the server:

mcp-control --sse

The console output will enumerate the utilized network interfaces, confirm the bound port (default: 3232), and signal connection readiness.

VM Deployment Workflow Summary

  1. Initialize your Windows Virtual Machine instance, fixing the display resolution to 1280x720.
  2. Install DAB-Win on the guest OS via npm.
  3. Start the server daemon using SSE mode: mcp-control --sse.
  4. Retrieve the VM's internal IP address (e.g., 192.168.1.100).
  5. Update your client configuration file to point to this address and port.
  6. Restart the client application. DAB-Win functionality will now appear in the tool selection interface.

⚙️ Command Line Interface Flags

DAB-Win exposes various arguments for customization:

# Standard activation via SSE on port 3232
mcp-control --sse

# Custom port usage
mcp-control --sse --port 3000

# Secure TLS/HTTPS activation (Mandatory for production/remote scenarios)
mcp-control --sse --https --cert /path/to/cert.pem --key /path/to/key.pem

# Secure connection on a non-standard port
mcp-control --sse --https --port 8443 --cert /path/to/cert.pem --key /path/to/key.pem

Argument Reference

  • --sse: Enables Server-Sent Events transport for network communication.
  • --port [value]: Specifies an alternative listening port (default: 3232).
  • --https: Activates TLS/SSL encryption (required by MCP specification for remote access).
  • --cert [path]: Location of the necessary TLS certificate file.
  • --key [path]: Location of the TLS private key file.

Security Mandate

The MCP specification mandates the use of HTTPS for any remote access utilizing HTTP-based transports. Always utilize the --https flag, accompanied by valid certificate/key pairs, when deploying DAB-Win outside of a localhost context.

✨ Principal Use Cases

Guided Operational Flows

  • User Interface Regression Testing: Offload monotonous interaction testing to the AI engine for automated bug identification.
  • Task Delegation: Empower the AI to manage routine, high-volume operational tasks autonomously.
  • Data Entry Automation: Utilize AI supervision for rapid and accurate population of digital forms.

AI Research & Experimentation

  • Interactive Simulation: Observe and guide AI agents through dynamic graphical environments.
  • Visual Problem Solving: Gauge the agent's capacity to interpret visual layouts and overcome screen-based obstacles.
  • Synergistic Interaction Models: Pioneer new methods where the AI perceives the desktop context to assist in complex workflows.

Development & Quality Assurance

  • Inter-Application Bridging: Facilitate communication between disparate software components lacking native integration.
  • Visual Regression Suites: Construct rigorous test suites incorporating visual confirmation checkpoints.
  • Demonstration Scripting: Automate the sequence generation required for software showcases.

🚨 CRITICAL WARNING AND LIABILITY WAIVER

THIS UTILITY IS CLASSIFIED AS EXPERIMENTAL AND CARRIES INHERENT OPERATIONAL RISK

By proceeding with the deployment or utilization of this software, you unequivocally acknowledge and consent to the following:

  • Granting an autonomous AI entity direct execution control over host system peripherals (mouse/keyboard) constitutes a significant security exposure.
  • The capability to manipulate system controls may result in unforeseen, detrimental system states or data corruption.
  • Usage is performed strictly at the user's personal liability and discretion.
  • The originators and maintainers of this codebase explicitly disclaim all liability for direct or consequential damages, data loss, or negative outcomes resulting from its use.
  • Deployment must be confined to isolated, controlled testing sandboxes until full confidence in its operation is established.

PROCEED WITH EXTREME CAUTION

🌟 Core Capabilities Matrix

🪟 Window Management Toolkit

  • Inventory of all active top-level windows
  • Retrieval of the currently focused window's metadata
  • Commands for activation, dimension adjustment, and positioning

🖱️ Cursor & Input Emulation

  • High-precision spatial cursor translation
  • Simulation of single and multi-click events
  • Tracking of scroll wheel input

⌨️ Key Input Simulation

  • Injection of textual strings
  • Execution of discrete key press/release cycles
  • Sustained key depression functionality

📸 Visual System Capture

  • Generation of high-fidelity bitmap images
  • Detection of total screen real estate dimensions
  • Targeted capture of specific application windows

🎛️ Automation Backend Providers

DAB-Win supports pluggable backend mechanisms for executing native commands:

  • keysender (Default): Utilizes a highly reliable, native Windows library for UI interaction.
  • powershell: Leverages Windows PowerShell cmdlets for simpler task automation.
  • autohotkey: Integration with AutoHotkey v2 scripting for highly complex, custom sequences.

Provider Configuration Methods

Global provider selection via environment variables:

# Designate AutoHotkey as the primary engine for all tasks
export AUTOMATION_PROVIDER=autohotkey

# Specify the exact executable path if AHK is not in system PATH
export AUTOHOTKEY_PATH="C:\Program Files\AutoHotkey\v2\AutoHotkey.exe"

Alternatively, module-specific assignment allows for mixed environments:

# Use AHK for keyboard, default for mouse, PowerShell for clipboard
export AUTOMATION_KEYBOARD_PROVIDER=autohotkey
export AUTOMATION_MOUSE_PROVIDER=keysender
export AUTOMATION_SCREEN_PROVIDER=keysender  
export AUTOMATION_CLIPBOARD_PROVIDER=powershell

Refer to dedicated documentation for specific provider details: AutoHotkey Provider Docs

🏗️ Development Environment Initialization

For developers aiming to contribute source code or compile the package from scratch, consult [CONTRIBUTING.md] for detailed setup procedures.

Essential Development Dependencies

  1. Operating System: Windows (mandatory due to native module linkage).
  2. Runtime: Node.js version 18 or newer (installation via official Windows MSI is recommended, as it often bundles necessary build tools).
  3. Package Manager: npm.
  4. Native Compilation Utilities:
  5. node-gyp: Install via npm install -g node-gyp
  6. cmake-js: Install via npm install -g cmake-js

The native keysender components necessitate the presence of these compilers for successful linkage.

📂 Source Code Organization

  • /src
  • /handlers: Modules managing incoming protocol requests and overall tool orchestration.
  • /tools: Concrete implementations of mouse, keyboard, and window functions.
  • /types: Centralized TypeScript interface definitions.
  • index.ts: Primary bootstrap file for application startup.

🌿 Repository Branches Overview

  • main: Active development stream incorporating the newest functional increments.
  • release: Branch pinned to stable, version-tagged releases (currently corresponding to v0.2.0).

Targeted Version Retrieval

Specific releases can be installed via npm:

# Fetches the latest committed stable version (mirrors release branch)
pm install mcp-control

# Install a specific historical release tag
pm install mcp-control@0.1.22

📚 Core Package Dependencies

  • @modelcontextprotocol/sdk - Protocol definition and communication framework.
  • keysender - Primary Windows UI interaction library.
  • clipboardy - Cross-platform clipboard interfacing.
  • sharp - High-performance image manipulation.
  • uuid - Standardized unique identifier generation.

🛑 Recognized Operational Constraints

  • Functions for minimizing or restoring windows are presently omitted from the API.
  • Multi-monitor setups may exhibit unpredictable behavior with screen capture utilities.
  • The get_screenshot function has known incompatibility issues with the VS Code Extension Cline (Ref: GitHub issue #1865).
  • Certain high-security applications might block input unless the host process runs with elevated administrative rights.
  • Exclusively supports Windows OS.
  • Accuracy Note: Optimal performance, especially pointer accuracy, is calibrated for a 1280x720, single-monitor configuration. Ongoing work is underway to resolve scaling offsets; contributors with testing assistance are welcomed.

🤝 Collaborative Engagement

For contribution guidelines, please consult the dedicated document: [CONTRIBUTING.md]

⚖️ Licensing Information

This software is distributed under the permissive MIT License. Refer to the LICENSE file for full terms.

📖 Further Documentation

MseeP.ai Security Assessment Badge


WIKIPEDIA CONTEXT NOTE: A headless browser operates without a conventional graphical interface, allowing automated manipulation via command line or network calls. They are invaluable for web testing because they interpret HTML/CSS/JavaScript identically to standard browsers, enabling accurate rendering checks. Modern browser releases (Chrome 59+, Firefox 56+) offer native remote control, diminishing reliance on older tools like PhantomJS.

Primary Applications: Web application validation, programmatic screenshot generation, automated JavaScript library testing, and interactive web page automation.

Secondary Uses: Web scraping (including Ajax-heavy content indexing). Malicious uses (DDoS, ad fraud) are not statistically favored over traditional browser methods according to recent traffic analysis.

Automation Tools: Software leveraging native headless support includes Selenium WebDriver, Playwright, and Puppeteer, offering unified APIs for controlling Chromium, Firefox, and WebKit engines. Alternatives like jsdom (for Node.js) provide DOM APIs but lack actual rendering capabilities.

See Also

`