web-agent-interface-mcp
Provides AI agents with a standardized service endpoint for executing web navigation and data extraction operations. This component injects contemporary web context capabilities into large language models.
Author

co-browser
Quick Info
Actions
Tags
web-agent-interface-mcp
Essential Dependencies (Prerequisites)
- uv - The swift Python environment/package manager.
- Playwright - The primary engine for browser manipulation.
- mcp-proxy - Necessary only when operating in the stdio communication method.
bash
Install required toolchain components
curl -LsSf https://astral.sh/uv/install.sh | sh uv tool install mcp-proxy uv tool update-shell
Runtime Configuration (Environment)
Establish a configuration file named .env:
bash OPENAI_API_KEY=your-secret-key-here CHROME_PATH=optional/path/to/installed/chrome PATIENT=false # Toggles synchronous API waiting for task finalization
Deployment (Installation)
bash
Synchronize project dependencies
uv sync uv pip install playwright uv run playwright install --with-deps --no-shell chromium
Operation Modes
Server-Sent Events (SSE) Transport
bash
Launch the service instance directly from source code
uv run server --port 8000
Standard Input/Output (stdio) Transport
bash
1. Package the project into a distribution artifact
uv build
Clean up prior installation if present
uv tool uninstall browser-use-mcp-server 2>/dev/null || true
Install the local package distribution universally
uv tool install dist/browser_use_mcp_server-*.whl
2. Execute the server utilizing stdio communication protocol
browser-use-mcp-server run server --port 8000 --stdio --proxy-port 9000
Client Endpoint Setup
SSE Connection Parameters
{ "mcpServers": { "web-agent-interface-mcp": { "url": "http://localhost:8000/sse" } } }
stdio Connection Parameters
{ "mcpServers": { "browser-server": { "command": "browser-use-mcp-server", "args": [ "run", "server", "--port", "8000", "--stdio", "--proxy-port", "9000" ], "env": { "OPENAI_API_KEY": "your-secret-key-here" } } } }
Configuration File Locations (Client-Specific)
| Client Application | Configuration File Path |
|---|---|
| Cursor | ./.cursor/mcp.json |
| Windsurf | ~/.codeium/windsurf/mcp_config.json |
| Claude (macOS) | ~/Library/Application Support/Claude/claude_desktop_config.json |
| Claude (Windows) | %APPDATA%\Claude\claude_desktop_config.json |
Core Capabilities
- [x] Browser Orchestration: AI agent control over web environments.
- [x] Protocol Versatility: Native support for both SSE and stdio communication streams.
- [x] Visual Feedback: Real-time browser session monitoring via VNC streaming.
- [x] Asynchronous Processing: Non-blocking execution capability for web tasks.
Local Development Workflow
To facilitate ongoing enhancement and testing of the package:
- Generate a deployable wheel file:
bash # Execute from the primary project directory uv build
- Install the newly built artifact system-wide:
bash uv tool uninstall browser-use-mcp-server 2>/dev/null || true uv tool install dist/browser_use_mcp_server-*.whl
- Initiate execution from any location:
bash # Set the required key for the current terminal session export OPENAI_API_KEY=your-key-for-this-session
# Or pass the key directly on the command line for an isolated execution OPENAI_API_KEY=your-key-for-this-session browser-use-mcp-server run server --port 8000 --stdio --proxy-port 9000
- Upon modifications, repeat the build and reinstall steps: bash uv build uv tool uninstall browser-use-mcp-server uv tool install dist/browser_use_mcp_server-*.whl
Containerization (Docker)
Leveraging Docker ensures a standardized, isolated runtime environment.
bash
Image compilation
docker build -t web-agent-interface-mcp .
Standard execution (uses default VNC password: "browser-use")
--rm cleans up the container upon exit
-p 8000:8000 maps the service port
-p 5900:5900 maps the VNC remote access port
docker run --rm -p8000:8000 -p5900:5900 web-agent-interface-mcp
Secure execution with a custom VNC password loaded from a file
1. Create the password file (e.g., vnc_secret.txt)
echo "my-highly-secure-vnc-pass" > vnc_secret.txt
2. Mount the file as a read-only secret inside the container
docker run --rm -p8000:8000 -p5900:5900 \ -v $(pwd)/vnc_secret.txt:/run/secrets/vnc_password:ro \ web-agent-interface-mcp
Security Note: The :ro volume flag enforces read-only access for the mounted password file.
VNC Client Connection
bash
Utilize a browser-accessible VNC client
git clone https://github.com/novnc/noVNC cd noVNC ./utils/novnc_proxy --vnc localhost:5900
Default Authentication Credential: browser-use (unless the custom file method is employed).
Demonstration Scenario
An example interaction prompt for an associated AI system:
text Instruct the interface to navigate to https://news.ycombinator.com and extract the title of the highest-scoring story.
Support Channel
Report issues or seek assistance at: cobrowser.xyz
Repository Popularity (Star History)
WIKIPEDIA DEEP DIVE: A browser operating without a Graphical User Interface (GUI) is termed a headless browser. These tools permit programmatic steering of web pages within an environment that closely mirrors standard browser functionality, but through a command-line or network interface. They are invaluable for rigorous web page validation, as they accurately interpret and render CSS, execute JavaScript, and handle Ajax calls—capabilities often absent in conventional testing frameworks. Since Firefox 56 and Chrome 59 introduced native remote management APIs, older solutions like PhantomJS have largely been superseded.
== Primary Applications == The principal use cases for headless execution environments involve:
Web testing automation for contemporary web applications. Automated capture of page screenshots. Running unit or integration tests for JavaScript frameworks. Automating complex user interactions across web interfaces.
=== Secondary Utility === Headless environments are also potent instruments for large-scale data acquisition from the web (web scraping). Google, for instance, acknowledged their utility in 2009 for indexing sites heavily reliant on Ajax. Conversely, misuse scenarios exist:
Orchestrating Distributed Denial of Service (DDoS) attacks. Inflating advertisement view counts. Unintended, automated interaction with sites, such as bulk credential testing. However, contemporary traffic analysis from 2018 suggests malicious actors do not show a statistically significant preference for headless tools over traditional browser variants when launching attacks like SQL injection or XSS.
== Implementation Methods == With several major browser vendors now natively supporting headless operation via dedicated interfaces, consolidated control layers have emerged:
Selenium WebDriver – Adheres to W3C WebDriver standards. Playwright – A robust library for automating Chromium, WebKit, and Firefox. Puppeteer – Primarily focused on automating Chrome or Firefox instances.
=== Automated Verification Frameworks === Numerous testing suites integrate headless browsers into their validation pipelines:
Capybara employs either Headless Chrome or WebKit emulation for mimicking human interaction. Jasmine defaults to Selenium but can be configured for WebKit or Headless Chrome testing. Cypress, a dedicated frontend testing ecosystem. QF-Test, a GUI testing utility capable of leveraging headless instances.
=== Non-Rendering Alternatives === An alternative pathway involves utilizing libraries that emulate browser APIs without rendering the visual layer. Deno incorporates browser APIs directly into its runtime structure. For Node.js environments, jsdom offers the most comprehensive API simulation. While these alternatives manage parsing, cookies, and XHR requests, they typically lack full DOM rendering and event system support, often executing faster than fully rendered solutions.
