MCP Web Interaction Engine

Developed during the AGI House MCP Hackathon

Core Concept

This component serves as an advanced browser manipulation agent, deeply integrated with the Model Context Protocol (MCP). It bridges the gap between the Claude intelligence layer and direct browser operational control, emphasizing robust state tracking and dynamic element management.

We extend sincere gratitude to the Browser-Use project for providing the foundational browser control primitives powering our MCP server instance.

Prerequisites

Operating System: macOS (darwin 24.2.0 environment)
Runtime: Python version 3.12 or newer
Dependency Manager: uv utility
Target Browser: Google Chrome (Crucially, ensure Chrome is fully terminated prior to task execution).

Deployment

Installation via Smithery (Recommended)

For automated setup within Claude Desktop using Smithery:

bash npx -y @smithery/cli install @ashley-ha/mcp-manus --client claude

Manual Setup Procedure

Obtain the source code via Git: bash git clone cd mcp
Initialize and populate the isolated Python environment using uv: bash uv venv source .venv/bin/activate uv sync

Configuration Directives

Claude Desktop Integration Settings

Modify or establish your Claude Desktop configuration file to recognize this server:

{ "mcpServers": { "browser-use": { "command": "uv", "args": [ "--directory", "/ABSOLUTE/PATH/TO/mcp", "run", "browser-use.py" ] } } }

Action Required: Substitute /ABSOLUTE/PATH/TO/browser-use with the actual filesystem location of your project structure.

Browser Environment Specification

The agent defaults to operating Google Chrome with the following parameters: - Execution Mode: Standard, non-headless presentation (suitable for development) - Viewport Dimensions: 1280 pixels wide by 1100 pixels high - Security Constraints: Certain protective mechanisms disabled for testing purposes - Session Recording Output: ./tmp/recordings directory

Capabilities

Execution of web automation routines via the MCP interface.
Integrated system for maintaining operational state and strategic planning.
Advanced identification and modification capabilities for interactive UI components.
Support for defining isolated and customizable browser contexts.
Comprehensive logging and diagnostic tracing utilities.

Operational Commands

The agent exposes two principal functional interfaces:

get_planner_state: Fetches the current snapshot of the browser's internal status and the active planning context.
execute_actions: Applies a sequence of defined operations within the live browser session.

Development Guidelines

Logging Framework

This project adheres to Python's standard logging module, configured as follows: - Output Stream: Standard error (stderr) - Format String: %(levelname)-8s [%(name)s] %(message)s - Root Logger Threshold: INFO - Third-Party Library Threshold: WARNING

Repository Layout

browser-use.py: The primary script initiating the server interface.
tmp/recordings: Storage location for session playback archives.
Dependencies: Managed exclusively through the uv toolchain.

Collaboration

This artifact originated during the intensive development period of the AGI House MCP Hackathon. Community contributions are highly encouraged.

Licensing

This software is distributed under the terms of the MIT License. Refer to the LICENSE file for comprehensive details.

Perpetual authorization is granted, free of charge, to any individual acquiring a copy of this software and associated documentation files (the "Software"), to engage with the Software without limitation, encompassing, but not restricted to, the rights to utilize, duplicate, adjust, combine, disseminate, permit sublicensing, and/or offer for sale, subject to the following stipulations:

The preceding copyright attribution and this specific grant notification must be incorporated into all instances or substantial segments of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, SUITABILITY FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BEAR LIABILITY FOR ANY CLAIM, DAMAGE, OR OTHER OBLIGATION, WHETHER IN A LEGAL ACTION OF CONTRACT, TORT, OR OTHERWISE, ARISING FROM, OUT OF, OR IN CONNECTION WITH THE SOFTWARE OR ITS UTILIZATION OR OTHER DEALINGS WITHIN THE SOFTWARE.

WIKIPEDIA CONTEXT: A headless browser is defined as a web browser instance operating without a graphical user interface layer. These environments facilitate the programmatic control of web pages, mirroring the functional capabilities of standard browsers, but are managed via command-line operations or network communication streams. They are exceptionally valuable for quality assurance and web application validation, as they accurately interpret HTML, apply CSS styling (layout, typography, coloration), and execute client-side JavaScript/Ajax, features often inaccessible via simpler testing methodologies. Since the rollout of Google Chrome version 59 and Firefox version 56, native remote control mechanisms have rendered older auxiliary tools, such as PhantomJS, largely obsolete.

== Primary Applications == The principal domains where headless browsers excel include:

Automated verification processes for contemporary web platforms (web testing).
Programmatic capture of webpage visual states (screenshots).
Executing unit and integration tests for JavaScript frameworks.
Orchestrating complex, scripted interactions across web interfaces.

=== Secondary Applications === Headless agents are also instrumental in sophisticated web data acquisition (web scraping). For instance, Google previously endorsed their utility in indexing content rendered dynamically via Ajax. Conversely, misuse has been documented, including:

Launch of Distributed Denial of Service (DDoS) attacks against web assets.
Artificially inflating advertising impression counts.
Unauthorized, automated interaction with site logic (e.g., credential testing). However, a 2018 traffic analysis indicated no discernible bias among malicious actors toward using headless versus conventional browser instances for illicit activities like DDoS, SQL injection attempts, or XSS vulnerabilities.

== Orchestration Tools == Due to native headless support across several major browser engines, specialized software has emerged to provide a unified control API:

Selenium WebDriver – Adheres to W3C standards for WebDriver implementation.
Playwright – A robust library for automating Chromium, Firefox, and WebKit environments.
Puppeteer – Focused on programmatic control over Chrome and Firefox instances.

=== Automated Testing Integration === Many testing frameworks incorporate headless browsing capabilities:

Capybara utilizes headless browsing (via WebKit or Headless Chrome) to simulate genuine user actions during protocol execution.
Jasmine typically defaults to Selenium but permits configuration for WebKit or Headless Chrome.
Cypress, a specialized frontend testing tool.
QF-Test, a GUI automation tool that supports headless browser operation.

=== Alternative Abstractions === An alternative strategy involves employing libraries that emulate browser APIs directly. Deno incorporates such APIs intrinsically. For the Node.js ecosystem, jsdom offers the most comprehensive simulation. While these generally support core features (HTML parsing, cookie handling, XHR, basic JavaScript), they lack full DOM rendering capabilities and have restricted event model support, typically resulting in faster execution speeds than full-stack browser environments.

mcp-browser-orchestrator

Author

ashley-ha

Quick Info

Actions

Tags