web-interaction-mcp-gateway
Facilitates the execution of web browser operations and acquisition of live internet data via a streamlined application programming interface. Enhances Large Language Model reasoning by enabling interaction with rendered web content.
Author

williamvd4
Quick Info
Actions
Tags
🚀 Web Interaction MCP Service Gateway
This implementation leverages the browser-use MCP Server, utilizing Server-Sent Events (SSE) for bidirectional communication.
Prerequisites
Requires the uv package manager.
Installation command:
bash curl -LsSf https://astral.sh/uv/install.sh | sh
Initial Setup Guide
Execute these commands in sequence to prepare the environment:
bash uv sync uv pip install playwright uv run playwright install --with-deps --no-shell chromium uv run server --port 8000
Configuration notes for the .env file:
OPENAI_API_KEY=[Your Credentials Here] CHROME_PATH=[Modify only if using a non-standard Chrome installation]
Future development plans include integrating support for diverse LLM backends (e.g., Claude, Grok, Bedrock).
When constructing the Docker image, an optional build argument allows setting the VNC access password:
bash docker build --build-arg VNC_PASSWORD=supersecretwordhere .
Available Functionality (Tools)
- [x] SSE Data Stream Transport
- [x]
browser_use: Triggers automated browser actions specified by a URL and desired operation. - [x]
browser_get_result: Fetches the output from previously initiated, asynchronous browser sessions.
Compatible Client Ecosystems
- Cursor.ai Integration
- Claude Desktop Client
- Claude Code Environment
Windsurf(Compatibility pending resolution of SSE requirement in windsurf)
Integration Procedure
Once the backend service is operational, configure your client application to communicate with http://localhost:8000/sse. Alternatively, embed this endpoint definition within your configuration file:
{ "mcpServers": { "web-interaction-mcp-gateway": { "url": "http://localhost:8000/sse" } } }
Location examples for various client configurations:
Cursor
./.cursor/mcp.json
Windsurf
~/.codeium/windsurf/mcp_config.json
Claude
~/Library/Application Support/Claude/claude_desktop_config.json%APPDATA%\Claude\claude_desktop_config.json
After setup, you can issue complex instructions to your connected LLM, for example:
Navigate to https://news.ycombinator.com and summarize the highest-rated story.
Support Channel
For inquiries or to report anomalies, please connect at https://cobrowser.xyz
Metrics
Background Context: Headless Browser Technology
WIKIPEDIA: A headless browser functions without a visible graphical interface. These tools permit automated control over web page rendering via a command line or network protocols. Their utility is maximized in web page validation, as they accurately simulate a standard browser's rendering pipeline, including stylesheets, layout, typography, and the execution of dynamic content like JavaScript and AJAX—features often absent in simpler testing methodologies. Since specific versions of Google Chrome (59+) and Firefox (56+) introduced native remote control capabilities, older solutions like PhantomJS have become largely deprecated.
== Primary Applications == The primary utilization contexts for running browsers without a GUI include:
- Automated quality assurance for contemporary web applications (Web Testing).
- Capturing programmatic snapshots (screenshots) of rendered web content.
- Executing automated quality checks for JavaScript libraries.
- Systematizing interaction sequences with user interfaces.
=== Ancillary Uses === Headless agents are also valuable for sophisticated web data harvesting (scraping). Google acknowledged as early as 2009 that these tools could aid search engine indexing of content relying on Ajax rendering. Conversely, headless agents have been leveraged for potentially harmful activities:
- Orchestrating Distributed Denial of Service (DDoS) attacks against web targets.
- Artificially inflating advertisement view counts.
- Abusing website logic, such as automated credential testing (credential stuffing).
However, a 2018 traffic analysis suggested that malicious actors do not show a distinct preference for headless agents over conventional browsers when executing activities like DDoS, SQL injection, or Cross-Site Scripting (XSS).
== Implementation Methods == As key browser vendors now natively support headless operation via dedicated APIs, various software frameworks have emerged to offer a standardized interaction layer. Notable examples include:
- Selenium WebDriver: Adheres to the W3C WebDriver protocol.
- Playwright: A library for Node.js enabling automation across Chromium, Firefox, and WebKit engines.
- Puppeteer: A Node.js utility specifically for controlling Chrome or Firefox instances.
=== Role in Quality Assurance === Certain testing platforms incorporate headless browsing capabilities as a core component of their testing apparatus.
- Capybara employs headless browsing (via WebKit or Headless Chrome) to simulate genuine user actions during protocol execution.
- Jasmine defaults to Selenium but can be configured for WebKit or Headless Chrome for browser-based tests.
- Cypress: A dedicated framework for frontend testing.
- QF-Test: A tool supporting automated graphical user interface testing, where headless browser execution is an option.
=== Competing Approaches === An alternative strategy involves employing software that emulates browser APIs directly. For instance, Deno incorporates browser APIs into its core design. For the Node.js ecosystem, jsdom offers the most comprehensive simulation. While these alternatives often support fundamental browser features (HTML parsing, cookie management, XHR, basic JavaScript), they generally lack full DOM rendering capabilities and have restricted event handling, often resulting in faster execution speeds compared to full headless rendering.
