DesktopAgentControlModule
Enables AI agents to programmatically interface with the desktop operating system's graphical elements via simulated physical input, facilitating automated execution of tasks within native applications.
Author

kitfactory
Quick Info
Actions
Tags
DesktopAgentControlModule 💻🤖 - OS Interaction Hub for Agents
Augment your AI framework's operational scope! 🚀 DesktopAgentControlModule (DACM) furnishes a robust communication layer between your sophisticated AI entities (operating within an MCP context) and the underlying Graphical User Interface (GUI) of the host machine. It grants the agent capabilities analogous to a human operator: visual perception 🧐, fine-grained pointer manipulation 🖱️, textual input delegation ⌨️, and window lifecycle management 🪟.
Eliminate reliance on manual intervention for routine digital chores. DACM is ideal for complex workflow orchestration, rigorous GUI regression validation, or building fully autonomous digital coworkers 🧑💻.
💡 Rationale for Adopting DACM
- 🤖 Agent Empowerment: Bestow direct control over conventional desktop software environments to your intelligent models.
- ✅ Seamless MCP Adherence: Optimized for rapid deployment within MCP-compliant execution environments, such as the Cursor IDE. Instantaneous setup is the standard.
- ⚙️ Minimal Overhead: Initialization requires only a straightforward invocation from the terminal; setup friction is negligible.
- 🖱️⌨️ Comprehensive Input Repertoire: Leverages the mature, extensively vetted functionalities provided by foundational libraries like PyAutoGUI and PyGetWindow.
- 🖼️ Visual Context Acquisition: Incorporates mechanisms for capturing screen states and performing template matching against visual assets—allowing agents to 'see' what the user sees.
- 🪟 Window State Manipulation: Full spectrum control over window dimensions, placement, focus state (minimize, maximize, restore). Maintain desktop organization effortlessly.
- 💬 User Feedback Loops: Utility functions for displaying modal dialogues (alerts, confirmations, user data prompts) to facilitate human-in-the-loop interaction.
🖥️ Operational Prerequisites
- Operating Systems: Compatible across Windows, macOS, and diverse Linux distributions (Ensure requisite system libraries for
pyautoguiare present). - Python Version: Requires Python 3.11 or newer 🐍
- Integration Clients: Cursor Editor, or any platform conforming to the Model Context Protocol (MCP)
🛠️ Initial Deployment Sequence - Effortless Activation!
1. Environment Preparation
It is strongly advised to isolate dependencies within a dedicated virtual environment.
# Environment Creation (e.g., using venv)
python -m venv .venv
# Windows Shell Activation
.venv\Scripts\Activate.ps1
# Unix/Linux/Mac Activation
source .venv/bin/activate
# Package Installation (Ensure VENV is active!)
pip install desktopagentcontrolmodule # Or pip install . for local build
(Note: Dependencies like scrot may be necessary for Linux screen capture tools. Consult the upstream pyautogui documentation for OS-specific needs.)
2. Launching the MCP Gateway Service
Execute the server component from your active shell:
# Confirm your virtual environment is sourced!
python -m desktopagentcontrolmodule.server
The service will initialize and begin listening for network connections (default endpoint: 127.0.0.1:6789). The console output will confirm readiness:
INFO: Started server process [XXXXX]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:6789 (Press CTRL+C to quit)
Keep this process alive to maintain agent connectivity for GUI manipulation.
✨ Integration Configuration within Cursor IDE
Connect DACM to Cursor via the @ invocation mechanism:
- Access MCP Configuration: Use the Command Palette (
Ctrl+Shift+PorCmd+Shift+P) and select "MCP: Open mcp.json configuration file". -
Register the Service: Insert or merge the following definition into your
mcp.json. Adjust the pathing if the execution context differs from the project root.```json { "mcpServers": { // ... other configurations ... "DesktopAgentControlModule": { "cwd": "${workspaceFolder}", // Specify the Python executable path if necessary to ensure VENV activation "command": "python", // Or full path: ".venv/Scripts/python.exe"
// Module invocation arguments "args": ["-m", "desktopagentcontrolmodule.server"] } // ... other configurations ... }}
`` *(Hint: Ifmcp.jsonexists, simply inject the "DesktopAgentControlModule": { ... } object into the existingmcpServers` dictionary.)* -
Save
mcp.json. Cursor will automatically detect the new service endpoint. -
Initiate Tasks: Invoke commands in Cursor chats using the registered handle
@DesktopAgentControlModule:Command Examples:
@DesktopAgentControlModule move_pointer_to(x_coord=100, y_coord=200)@DesktopAgentControlModule input_text(payload='AI Automation Active! 🌟', delay_between_chars=0.1)@DesktopAgentControlModule capture_screen(output_file='current_state.png')@DesktopAgentControlModule focus_application(window_identifier='Calculator')
🗂️ Exposed Functionality Index
DACM exposes a rich set of functions mirroring the capabilities of its underlying libraries. Key categories include:
- Pointer Control 🖱️:
move_to,click_event,move_relative,drag_operation,scroll_wheel,press_button_down,release_button_up,query_current_position - Key Input ⌨️:
type_string,trigger_key,key_state_down,key_state_up,execute_key_sequence(hotkey) - Visual Capture 🖼️:
capture_screen,find_image_location,find_image_center - Window Governance 🪟:
enumerate_all_titles,filter_windows_by_name,get_active_window_handle,bring_to_foreground,minimize_window_state,maximize_window_state,restore_window_state,reposition_window,reframe_window,terminate_process - Interface Dialogs 💬:
display_info_box,request_confirmation,gather_user_input,gather_secure_input - System Configuration ⚙️:
configure_global_delay,set_error_failsafe_mode
For a complete function schema, inspect desktopagentcontrolmodule/server.py or utilize the introspection command @DesktopAgentControlModule list_tools within your client interface.
📄 Licensing Information
This software is distributed under the terms of the MIT License. Refer to the LICENSE file for specifics. Enjoy building sophisticated, automated workflows! 🎉
