MCP Web Interaction Conductor

Capabilities of This Model Context Protocol

This component is engineered to enable AI entities to execute complex web tasks, including automated browsing, data harvesting, and system interaction via browser interfaces, utilizing the Model Context Protocol (MCP) alongside the Selenium binding.

The distinguishing characteristic of this MCP is its capacity to manage concurrent operations across numerous browser sessions (windows). It eliminates the prerequisite of launching auxiliary Docker containers, virtual machines, or separate physical machines to support multiple concurrent scraping/automation agents. Furthermore, all executing agents can share a unified browser persistence profile, maintaining state (like logins) across different processes.

This design ensures unparalleled scalability and operational simplicity: deploy as many intelligent agents as required, and they will function harmoniously. You can concurrently run instances powered by Claude Code, Codex CLI, Gemini CLI, and a specialized fast-agent, all sharing the identical browser environment on a single host, achieving near-parallel execution.

Our core directive is to enable AI agents to accomplish any defined web objective with minimal human oversight, purely driven by conversational instruction sets.

Key Features Summary

Document Size Management (HTML Truncation): The MCP offers configuration for capping the size of retrieved HTML content. Opposing scraping utilities often flood the AI context with excessively large page snapshots or raw HTML dumps. This MCP mitigates that by respecting the MCP_MAX_SNAPSHOT_CHARS environment variable to enforce a maximum page size.
Multi-Agent Window Isolation with Shared Profile: Connect numerous independent agents without inter-agent synchronization requirements. Agents can utilize the same persistent browser profile (useful for maintaining authenticated sessions). Each agent is automatically assigned its own distinct browser window, preventing operational conflicts. This is managed internally using Chrome DevTools Protocol Target IDs.

Current Constraints

Iframe Contextual Reliance: Workflows spanning multiple steps within embedded iframes necessitate explicitly providing the iframe_selector argument for every subsequent action. The browser context is refreshed following each tool invocation for enhanced reliability. For iframe-heavy operations, re-specify the relevant iframe selector parameter in every invocation of click_element, fill_text, or debug_element.

Configuration and Deployment

We strongly advise utilizing either Chrome Beta or Chrome Canary builds. This separation helps prevent unexpected conflicts with your standard, manually-operated Chrome installation. Although this MCP can manage an unlimited number of agents against a single Chrome instance, the underlying mechanism mandates that the executable be launched in developer mode. Standard user-initiated launches do not activate developer mode, presenting a problem. Therefore, to ensure normal browser operation while adhering to the MCP needs, please install and configure Chrome Beta (preferred) or Chrome Canary (less recommended due to potential instability).
Once Chrome Beta is installed, specify its executable path within the .env configuration file as detailed below.
Initiate the MCP server component (Consult the "Operational Guide" section if unfamiliar with server startup procedures).

Operational Guide

Please consult the official MCP documentation on modelcontextprotocol.io for comprehensive setup.

Crucially, ensure all necessary Python dependencies, as listed in requirements.txt, are installed within the specific Python environment referenced by your MCP configuration file. Typically, pointing to a dedicated virtual environment (e.g., ./.venv/bin/python) is superior to referencing the global system Python (python or python3).

For instance, if the repository is cloned into your local code directory, the necessary MCP configuration file entry should resemble:

{ "mcpServers": { "mcp_browser_use": { "command": "/Users/janspoerer/code/mcp_browser_use/.venv/bin/python", "args": [ "/Users/janspoerer/code/mcp_browser_use/mcp_browser_use" ] } } }

On macOS systems, this configuration file is typically situated at: /Users/janspoerer/Library/Application Support/Claude/claude_desktop_config.json.

After saving the configuration, restart the Claude application. Claude will surface any JSON parsing errors if the setup is incorrect.

Successful initialization is signaled by the appearance of a small wrench icon in the lower-right corner of the Claude 'New Chat' interface. The adjacent number indicates the count of functions exposed by this MCP.

Click the wrench icon to inspect the available functionalities.

Environment Variables (`.env`)

CHROME_PROFILE_NAME=Selenium CHROME_EXECUTABLE_PATH=/Applications/Google Chrome.app/Contents/MacOS/Google Chrome CHROME_PROFILE_USER_DATA_DIR=/Users/janspoerer/Library/Application Support/Google/Chrome CHROME_PROFILE_NAME=Profile 15 MCP_MAX_SNAPSHOT_CHARS=10000

Exposed Functionalities

Diagnostic Procedures

Verify that the automated browser instance is operational by accessing the following endpoint in a standard, non-automated web browser:

http://127.0.0.1:9223/json/version

If successful, the output will display version information akin to this:

{ "Browser": "Chrome/140.0.7339.24", "Protocol-Version": "1.3", "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/140.0.0.0 Safari/537.36", "V8-Version": "14.0.365.3", "WebKit-Version": "537.36 (@f8765868e23d9ee5209061fc999f6495c525cd13)", "webSocketDebuggerUrl": "ws://127.0.0.1:9223/devtools/browser/d8f511eb-947c-4eb1-833d-917212a92394" }

Demonstration Video (YouTube)

Executing Verification Suites

We explicitly wish to avoid utilizing the pytest-asyncio framework for testing purposes.

pip install -e ".[test]"

WIKIPEDIA NOTE: A headless browser operates without a visible graphical interface. These environments permit programmatic manipulation of web pages under conditions mirroring standard browser execution, accessed via command line or network interfaces. They are invaluable for quality assurance, as they accurately interpret and render HTML, including CSS styling (layout, color, typography) and JavaScript execution, capabilities often absent in simpler testing methods. Modern Chrome (v59+) and Firefox (v56+) natively support remote control, rendering previous solutions like PhantomJS largely obsolete.

== Primary Applications == The core use cases for browser automation without a GUI include:

Automated validation of contemporary web applications (Web Testing).
Generating high-fidelity static page captures (Screenshots).
Running automated unit tests for JavaScript frameworks.
Programmatic interaction with web page elements.

=== Secondary Utility === Headless environments are also beneficial for extensive web data acquisition. Google itself acknowledged in 2009 that using such tools assists in indexing content reliant on Ajax. Conversely, these tools have seen occasional malicious deployment, such as facilitating Denial-of-Service (DDoS) attacks, inflating advertisement impressions, or automating unintended site interactions (e.g., credential stuffing). However, a 2018 traffic analysis indicated no discernible preference by malicious actors for headless browsers over standard ones for activities like DDoS or injection attacks.

== Implementation Ecosystem == Given that major browser vendors now natively support headless operation via distinct APIs, several software projects offer a unified abstraction layer for browser control. Key examples include:

Selenium WebDriver – Adheres to W3C WebDriver standards.
Playwright – A Node.js library supporting Chromium, Firefox, and WebKit automation.
Puppeteer – A Node.js library specialized for controlling Chrome or Firefox.

=== Test Framework Integration === Numerous test harnesses integrate headless browsers into their apparatus. For instance:

Capybara employs headless browsing (via WebKit or Headless Chrome) to simulate user actions.
Jasmine defaults to Selenium but permits configuration for WebKit or Headless Chrome.
Cypress, a modern frontend testing framework.
QF-Test, a tool for GUI-based automated testing that supports headless operation.

=== Alternative Methodologies === An alternative strategy involves employing libraries that supply browser-like APIs directly within the runtime environment. For instance, Deno embeds these APIs intrinsically. For Node.js, jsdom provides the most comprehensive emulation. While these alternatives often handle HTML parsing, cookies, and XHR, they typically do not fully render the Document Object Model (DOM) or support a complete range of DOM events, and generally execute faster than full browser simulation.

mcp_web_agent_orchestrator

Author

janspoerer

Quick Info

Actions

Tags

MCP Web Interaction Conductor

Capabilities of This Model Context Protocol

Key Features Summary

Current Constraints

Configuration and Deployment

Operational Guide

Environment Variables (`.env`)

Exposed Functionalities

Diagnostic Procedures

Demonstration Video (YouTube)

Executing Verification Suites

See Also

mcp_web_agent_orchestrator

Author

janspoerer

Quick Info

Actions

Tags

MCP Web Interaction Conductor

Capabilities of This Model Context Protocol

Key Features Summary

Current Constraints

Configuration and Deployment

Operational Guide

Environment Variables (.env)

Exposed Functionalities

Diagnostic Procedures

Demonstration Video (YouTube)

Executing Verification Suites

See Also

Environment Variables (`.env`)