web-agent-interface-mcp

[![Twitter URL](https://img.shields.io/twitter/url/https/twitter.com/cobrowser.svg?style=social&label=Follow%20%40cobrowser)](https://x.com/cobrowser) [![Discord](https://img.shields.io/discord/1351569878116470928?logo=discord&logoColor=white&label=discord&color=white)](https://discord.gg/gw9UpFUhyY) [![PyPI version](https://badge.fury.io/py/browser-use-mcp-server.svg)](https://badge.fury.io/py/browser-use-mcp-server) **An MCP gateway facilitating agent-driven browser control utilizing the [browser-use](https://github.com/browser-use/browser-use) library.** > **🌐 Curious about AI-Powered Browsing?** Explore the open-source client: [**Vibe Browser**](https://github.com/co-browser/vibe). > > **⚙️ Streamlining Multi-Server Deployments?** Utilize the orchestration tool [agent-browser](https://github.com/co-browser/agent-browser) for workflow simplification.

Essential Dependencies (Prerequisites)

uv - The swift Python environment/package manager.
Playwright - The primary engine for browser manipulation.
mcp-proxy - Necessary only when operating in the stdio communication method.

bash

Install required toolchain components

curl -LsSf https://astral.sh/uv/install.sh | sh uv tool install mcp-proxy uv tool update-shell

Runtime Configuration (Environment)

Establish a configuration file named .env:

bash OPENAI_API_KEY=your-secret-key-here CHROME_PATH=optional/path/to/installed/chrome PATIENT=false # Toggles synchronous API waiting for task finalization

Deployment (Installation)

bash

Synchronize project dependencies

uv sync uv pip install playwright uv run playwright install --with-deps --no-shell chromium

Operation Modes

Server-Sent Events (SSE) Transport

bash

Launch the service instance directly from source code

uv run server --port 8000

Standard Input/Output (stdio) Transport

bash

1. Package the project into a distribution artifact

uv build

Clean up prior installation if present

uv tool uninstall browser-use-mcp-server 2>/dev/null || true

Install the local package distribution universally

uv tool install dist/browser_use_mcp_server-*.whl

2. Execute the server utilizing stdio communication protocol

browser-use-mcp-server run server --port 8000 --stdio --proxy-port 9000

Client Endpoint Setup

SSE Connection Parameters

{ "mcpServers": { "web-agent-interface-mcp": { "url": "http://localhost:8000/sse" } } }

stdio Connection Parameters

{ "mcpServers": { "browser-server": { "command": "browser-use-mcp-server", "args": [ "run", "server", "--port", "8000", "--stdio", "--proxy-port", "9000" ], "env": { "OPENAI_API_KEY": "your-secret-key-here" } } } }

Configuration File Locations (Client-Specific)

Client Application	Configuration File Path
Cursor	`./.cursor/mcp.json`
Windsurf	`~/.codeium/windsurf/mcp_config.json`
Claude (macOS)	`~/Library/Application Support/Claude/claude_desktop_config.json`
Claude (Windows)	`%APPDATA%\Claude\claude_desktop_config.json`

Core Capabilities

[x] Browser Orchestration: AI agent control over web environments.
[x] Protocol Versatility: Native support for both SSE and stdio communication streams.
[x] Visual Feedback: Real-time browser session monitoring via VNC streaming.
[x] Asynchronous Processing: Non-blocking execution capability for web tasks.

Local Development Workflow

To facilitate ongoing enhancement and testing of the package:

Generate a deployable wheel file:

bash # Execute from the primary project directory uv build

Install the newly built artifact system-wide:

bash uv tool uninstall browser-use-mcp-server 2>/dev/null || true uv tool install dist/browser_use_mcp_server-*.whl

Initiate execution from any location:

bash # Set the required key for the current terminal session export OPENAI_API_KEY=your-key-for-this-session

# Or pass the key directly on the command line for an isolated execution OPENAI_API_KEY=your-key-for-this-session browser-use-mcp-server run server --port 8000 --stdio --proxy-port 9000

Upon modifications, repeat the build and reinstall steps: bash uv build uv tool uninstall browser-use-mcp-server uv tool install dist/browser_use_mcp_server-*.whl

Containerization (Docker)

Leveraging Docker ensures a standardized, isolated runtime environment.

bash

Image compilation

docker build -t web-agent-interface-mcp .

Standard execution (uses default VNC password: "browser-use")

--rm cleans up the container upon exit

-p 8000:8000 maps the service port

-p 5900:5900 maps the VNC remote access port

docker run --rm -p8000:8000 -p5900:5900 web-agent-interface-mcp

Secure execution with a custom VNC password loaded from a file

1. Create the password file (e.g., vnc_secret.txt)

echo "my-highly-secure-vnc-pass" > vnc_secret.txt

2. Mount the file as a read-only secret inside the container

docker run --rm -p8000:8000 -p5900:5900 \ -v $(pwd)/vnc_secret.txt:/run/secrets/vnc_password:ro \ web-agent-interface-mcp

Security Note: The :ro volume flag enforces read-only access for the mounted password file.

VNC Client Connection

bash

Utilize a browser-accessible VNC client

git clone https://github.com/novnc/noVNC cd noVNC ./utils/novnc_proxy --vnc localhost:5900

Default Authentication Credential: browser-use (unless the custom file method is employed).

Demonstration Scenario

An example interaction prompt for an associated AI system:

text Instruct the interface to navigate to https://news.ycombinator.com and extract the title of the highest-scoring story.

Support Channel

Report issues or seek assistance at: cobrowser.xyz

Repository Popularity (Star History)

WIKIPEDIA DEEP DIVE: A browser operating without a Graphical User Interface (GUI) is termed a headless browser. These tools permit programmatic steering of web pages within an environment that closely mirrors standard browser functionality, but through a command-line or network interface. They are invaluable for rigorous web page validation, as they accurately interpret and render CSS, execute JavaScript, and handle Ajax calls—capabilities often absent in conventional testing frameworks. Since Firefox 56 and Chrome 59 introduced native remote management APIs, older solutions like PhantomJS have largely been superseded.

== Primary Applications == The principal use cases for headless execution environments involve:

Web testing automation for contemporary web applications. Automated capture of page screenshots. Running unit or integration tests for JavaScript frameworks. Automating complex user interactions across web interfaces.

=== Secondary Utility === Headless environments are also potent instruments for large-scale data acquisition from the web (web scraping). Google, for instance, acknowledged their utility in 2009 for indexing sites heavily reliant on Ajax. Conversely, misuse scenarios exist:

Orchestrating Distributed Denial of Service (DDoS) attacks. Inflating advertisement view counts. Unintended, automated interaction with sites, such as bulk credential testing. However, contemporary traffic analysis from 2018 suggests malicious actors do not show a statistically significant preference for headless tools over traditional browser variants when launching attacks like SQL injection or XSS.

== Implementation Methods == With several major browser vendors now natively supporting headless operation via dedicated interfaces, consolidated control layers have emerged:

Selenium WebDriver – Adheres to W3C WebDriver standards. Playwright – A robust library for automating Chromium, WebKit, and Firefox. Puppeteer – Primarily focused on automating Chrome or Firefox instances.

=== Automated Verification Frameworks === Numerous testing suites integrate headless browsers into their validation pipelines:

Capybara employs either Headless Chrome or WebKit emulation for mimicking human interaction. Jasmine defaults to Selenium but can be configured for WebKit or Headless Chrome testing. Cypress, a dedicated frontend testing ecosystem. QF-Test, a GUI testing utility capable of leveraging headless instances.

=== Non-Rendering Alternatives === An alternative pathway involves utilizing libraries that emulate browser APIs without rendering the visual layer. Deno incorporates browser APIs directly into its runtime structure. For Node.js environments, jsdom offers the most comprehensive API simulation. While these alternatives manage parsing, cookies, and XHR requests, they typically lack full DOM rendering and event system support, often executing faster than fully rendered solutions.

web-agent-interface-mcp

Author

co-browser

Quick Info

Actions

Tags

web-agent-interface-mcp

Essential Dependencies (Prerequisites)

Install required toolchain components

Runtime Configuration (Environment)

Deployment (Installation)

Synchronize project dependencies

Operation Modes

Server-Sent Events (SSE) Transport

Launch the service instance directly from source code

Standard Input/Output (stdio) Transport

1. Package the project into a distribution artifact

Clean up prior installation if present

Install the local package distribution universally

2. Execute the server utilizing stdio communication protocol

Client Endpoint Setup

SSE Connection Parameters

stdio Connection Parameters

Configuration File Locations (Client-Specific)

Core Capabilities

Local Development Workflow

Containerization (Docker)

Image compilation

Standard execution (uses default VNC password: "browser-use")

--rm cleans up the container upon exit

-p 8000:8000 maps the service port

-p 5900:5900 maps the VNC remote access port

Secure execution with a custom VNC password loaded from a file

1. Create the password file (e.g., vnc_secret.txt)

2. Mount the file as a read-only secret inside the container

VNC Client Connection

Utilize a browser-accessible VNC client

Demonstration Scenario

Support Channel

Repository Popularity (Star History)

See Also