🚀 Web Interaction MCP Service Gateway

This implementation leverages the browser-use MCP Server, utilizing Server-Sent Events (SSE) for bidirectional communication.

Prerequisites

Requires the uv package manager.

Installation command:

bash curl -LsSf https://astral.sh/uv/install.sh | sh

Initial Setup Guide

Execute these commands in sequence to prepare the environment:

bash uv sync uv pip install playwright uv run playwright install --with-deps --no-shell chromium uv run server --port 8000

Configuration notes for the .env file:

OPENAI_API_KEY=[Your Credentials Here] CHROME_PATH=[Modify only if using a non-standard Chrome installation]

Future development plans include integrating support for diverse LLM backends (e.g., Claude, Grok, Bedrock).

When constructing the Docker image, an optional build argument allows setting the VNC access password:

bash docker build --build-arg VNC_PASSWORD=supersecretwordhere .

Available Functionality (Tools)

[x] SSE Data Stream Transport
[x] browser_use: Triggers automated browser actions specified by a URL and desired operation.
[x] browser_get_result: Fetches the output from previously initiated, asynchronous browser sessions.

Compatible Client Ecosystems

Cursor.ai Integration
Claude Desktop Client
Claude Code Environment
~~Windsurf~~ (Compatibility pending resolution of SSE requirement in windsurf)

Integration Procedure

Once the backend service is operational, configure your client application to communicate with http://localhost:8000/sse. Alternatively, embed this endpoint definition within your configuration file:

{ "mcpServers": { "web-interaction-mcp-gateway": { "url": "http://localhost:8000/sse" } } }

Location examples for various client configurations:

Cursor

./.cursor/mcp.json

Windsurf

~/.codeium/windsurf/mcp_config.json

Claude

~/Library/Application Support/Claude/claude_desktop_config.json
%APPDATA%\Claude\claude_desktop_config.json

After setup, you can issue complex instructions to your connected LLM, for example:

Navigate to https://news.ycombinator.com and summarize the highest-rated story.

Support Channel

For inquiries or to report anomalies, please connect at https://cobrowser.xyz

Metrics

Background Context: Headless Browser Technology

WIKIPEDIA: A headless browser functions without a visible graphical interface. These tools permit automated control over web page rendering via a command line or network protocols. Their utility is maximized in web page validation, as they accurately simulate a standard browser's rendering pipeline, including stylesheets, layout, typography, and the execution of dynamic content like JavaScript and AJAX—features often absent in simpler testing methodologies. Since specific versions of Google Chrome (59+) and Firefox (56+) introduced native remote control capabilities, older solutions like PhantomJS have become largely deprecated.

== Primary Applications == The primary utilization contexts for running browsers without a GUI include:

Automated quality assurance for contemporary web applications (Web Testing).
Capturing programmatic snapshots (screenshots) of rendered web content.
Executing automated quality checks for JavaScript libraries.
Systematizing interaction sequences with user interfaces.

=== Ancillary Uses === Headless agents are also valuable for sophisticated web data harvesting (scraping). Google acknowledged as early as 2009 that these tools could aid search engine indexing of content relying on Ajax rendering. Conversely, headless agents have been leveraged for potentially harmful activities:

Orchestrating Distributed Denial of Service (DDoS) attacks against web targets.
Artificially inflating advertisement view counts.
Abusing website logic, such as automated credential testing (credential stuffing).

However, a 2018 traffic analysis suggested that malicious actors do not show a distinct preference for headless agents over conventional browsers when executing activities like DDoS, SQL injection, or Cross-Site Scripting (XSS).

== Implementation Methods == As key browser vendors now natively support headless operation via dedicated APIs, various software frameworks have emerged to offer a standardized interaction layer. Notable examples include:

Selenium WebDriver: Adheres to the W3C WebDriver protocol.
Playwright: A library for Node.js enabling automation across Chromium, Firefox, and WebKit engines.
Puppeteer: A Node.js utility specifically for controlling Chrome or Firefox instances.

=== Role in Quality Assurance === Certain testing platforms incorporate headless browsing capabilities as a core component of their testing apparatus.

Capybara employs headless browsing (via WebKit or Headless Chrome) to simulate genuine user actions during protocol execution.
Jasmine defaults to Selenium but can be configured for WebKit or Headless Chrome for browser-based tests.
Cypress: A dedicated framework for frontend testing.
QF-Test: A tool supporting automated graphical user interface testing, where headless browser execution is an option.

=== Competing Approaches === An alternative strategy involves employing software that emulates browser APIs directly. For instance, Deno incorporates browser APIs into its core design. For the Node.js ecosystem, jsdom offers the most comprehensive simulation. While these alternatives often support fundamental browser features (HTML parsing, cookie management, XHR, basic JavaScript), they generally lack full DOM rendering capabilities and have restricted event handling, often resulting in faster execution speeds compared to full headless rendering.

web-interaction-mcp-gateway

Author

williamvd4

Quick Info

Actions

Tags