logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

DesktopAgentControlModule

Enables AI agents to programmatically interface with the desktop operating system's graphical elements via simulated physical input, facilitating automated execution of tasks within native applications.

Author

DesktopAgentControlModule logo

kitfactory

MIT License

Quick Info

GitHub GitHub Stars 10
NPM Weekly Downloads 0
Tools 1
Last Updated 2026-02-19

Tags

automationpymcpautoguiautomatebrowser automationautomation webpymcpautogui automate

DesktopAgentControlModule 💻🤖 - OS Interaction Hub for Agents

License: MIT

Augment your AI framework's operational scope! 🚀 DesktopAgentControlModule (DACM) furnishes a robust communication layer between your sophisticated AI entities (operating within an MCP context) and the underlying Graphical User Interface (GUI) of the host machine. It grants the agent capabilities analogous to a human operator: visual perception 🧐, fine-grained pointer manipulation 🖱️, textual input delegation ⌨️, and window lifecycle management 🪟.

Eliminate reliance on manual intervention for routine digital chores. DACM is ideal for complex workflow orchestration, rigorous GUI regression validation, or building fully autonomous digital coworkers 🧑‍💻.

💡 Rationale for Adopting DACM

  • 🤖 Agent Empowerment: Bestow direct control over conventional desktop software environments to your intelligent models.
  • ✅ Seamless MCP Adherence: Optimized for rapid deployment within MCP-compliant execution environments, such as the Cursor IDE. Instantaneous setup is the standard.
  • ⚙️ Minimal Overhead: Initialization requires only a straightforward invocation from the terminal; setup friction is negligible.
  • 🖱️⌨️ Comprehensive Input Repertoire: Leverages the mature, extensively vetted functionalities provided by foundational libraries like PyAutoGUI and PyGetWindow.
  • 🖼️ Visual Context Acquisition: Incorporates mechanisms for capturing screen states and performing template matching against visual assets—allowing agents to 'see' what the user sees.
  • 🪟 Window State Manipulation: Full spectrum control over window dimensions, placement, focus state (minimize, maximize, restore). Maintain desktop organization effortlessly.
  • 💬 User Feedback Loops: Utility functions for displaying modal dialogues (alerts, confirmations, user data prompts) to facilitate human-in-the-loop interaction.

🖥️ Operational Prerequisites

  • Operating Systems: Compatible across Windows, macOS, and diverse Linux distributions (Ensure requisite system libraries for pyautogui are present).
  • Python Version: Requires Python 3.11 or newer 🐍
  • Integration Clients: Cursor Editor, or any platform conforming to the Model Context Protocol (MCP)

🛠️ Initial Deployment Sequence - Effortless Activation!

1. Environment Preparation

It is strongly advised to isolate dependencies within a dedicated virtual environment.

# Environment Creation (e.g., using venv)
python -m venv .venv
# Windows Shell Activation
.venv\Scripts\Activate.ps1
# Unix/Linux/Mac Activation
source .venv/bin/activate

# Package Installation (Ensure VENV is active!)
pip install desktopagentcontrolmodule # Or pip install . for local build

(Note: Dependencies like scrot may be necessary for Linux screen capture tools. Consult the upstream pyautogui documentation for OS-specific needs.)

2. Launching the MCP Gateway Service

Execute the server component from your active shell:

# Confirm your virtual environment is sourced!
python -m desktopagentcontrolmodule.server

The service will initialize and begin listening for network connections (default endpoint: 127.0.0.1:6789). The console output will confirm readiness:

INFO:     Started server process [XXXXX]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:6789 (Press CTRL+C to quit)

Keep this process alive to maintain agent connectivity for GUI manipulation.

✨ Integration Configuration within Cursor IDE

Connect DACM to Cursor via the @ invocation mechanism:

  1. Access MCP Configuration: Use the Command Palette (Ctrl+Shift+P or Cmd+Shift+P) and select "MCP: Open mcp.json configuration file".
  2. Register the Service: Insert or merge the following definition into your mcp.json. Adjust the pathing if the execution context differs from the project root.

    ```json { "mcpServers": { // ... other configurations ... "DesktopAgentControlModule": { "cwd": "${workspaceFolder}", // Specify the Python executable path if necessary to ensure VENV activation "command": "python", // Or full path: ".venv/Scripts/python.exe"

            // Module invocation arguments
            "args": ["-m", "desktopagentcontrolmodule.server"]
        }
        // ... other configurations ...
    }
    

    } `` *(Hint: Ifmcp.jsonexists, simply inject the "DesktopAgentControlModule": { ... } object into the existingmcpServers` dictionary.)*

  3. Save mcp.json. Cursor will automatically detect the new service endpoint.

  4. Initiate Tasks: Invoke commands in Cursor chats using the registered handle @DesktopAgentControlModule:

    Command Examples: @DesktopAgentControlModule move_pointer_to(x_coord=100, y_coord=200) @DesktopAgentControlModule input_text(payload='AI Automation Active! 🌟', delay_between_chars=0.1) @DesktopAgentControlModule capture_screen(output_file='current_state.png') @DesktopAgentControlModule focus_application(window_identifier='Calculator')

🗂️ Exposed Functionality Index

DACM exposes a rich set of functions mirroring the capabilities of its underlying libraries. Key categories include:

  • Pointer Control 🖱️: move_to, click_event, move_relative, drag_operation, scroll_wheel, press_button_down, release_button_up, query_current_position
  • Key Input ⌨️: type_string, trigger_key, key_state_down, key_state_up, execute_key_sequence (hotkey)
  • Visual Capture 🖼️: capture_screen, find_image_location, find_image_center
  • Window Governance 🪟: enumerate_all_titles, filter_windows_by_name, get_active_window_handle, bring_to_foreground, minimize_window_state, maximize_window_state, restore_window_state, reposition_window, reframe_window, terminate_process
  • Interface Dialogs 💬: display_info_box, request_confirmation, gather_user_input, gather_secure_input
  • System Configuration ⚙️: configure_global_delay, set_error_failsafe_mode

For a complete function schema, inspect desktopagentcontrolmodule/server.py or utilize the introspection command @DesktopAgentControlModule list_tools within your client interface.

📄 Licensing Information

This software is distributed under the terms of the MIT License. Refer to the LICENSE file for specifics. Enjoy building sophisticated, automated workflows! 🎉

See Also

`