mcp-web-surfer
Enables sophisticated, non-graphical control over web browser environments and subsequent remote procedure call integration. This tool allows for dynamic navigation, manipulation of the Document Object Model (DOM), and arbitrary JavaScript execution within a web page context, featuring stateful session management and comprehensive request/response logging capabilities for intricate web task orchestration.
Author

imprvhub
Quick Info
Actions
Tags
MCP Web Surfer Utility
| A highly capable Model Context Protocol (MCP) extension designed to equip Claude Desktop with autonomous, scriptable web browsing functionality. |
|
Core Capabilities
- Advanced Web Page Control
- Direct navigation to specified Uniform Resource Locators (URLs) with adjustable loading synchronization points.
- Extraction of visual representations (screenshots) of the entire viewport or specific element regions.
- Fine-grained manipulation of page elements, including simulating user inputs like clicks, text entry, selection choices, and pointer hovering.
-
In-browser script execution environment for running custom JavaScript, with capture of standard console outputs.
-
Integrated Remote Procedure Call (RPC) Client
- Issuance of standard Hypertext Transfer Protocol (HTTP) operations (GET, POST, PUT, PATCH, DELETE).
- Customizable configuration for request headers and payload bodies.
- Automated parsing and structuring of incoming response data, supporting JSON formats.
-
Robust exception handling and detailed reporting for failed transactions.
-
MCP Resource Provisioning
- Exposing browser console transcripts as consumable MCP resources.
- Providing access to captured visual artifacts via the standard resource interface.
-
Maintenance of a continuous, observable browser session context.
-
Autonomous Agent Functionality
- Sequencing multiple discrete browsing or API operations into complex workflows.
- Intelligent adaptation to multi-stage instructions, incorporating fault tolerance and self-correction mechanisms.
- Execution of technical mandates using natural language directives.
Demonstration Video
Key Moments:
Jump to the corresponding video segment by clicking the time annotation. [**00:00**](https://www.youtube.com/watch?v=0lMsKiTy7TE&t=0s) - **MCP Terminology Search** Demonstrates initiating a Google search for "Model Context Protocol" using Claude Desktop via this integration, and processing the search results. [**00:33**](https://www.youtube.com/watch?v=0lMsKiTy7TE&t=33s) - **Visual Capture Example** Shows capturing a snapshot of the search results, saving it with a specified filename, and confirming its location. [**01:00**](https://www.youtube.com/watch?v=0lMsKiTy7TE&t=60s) - **Wikipedia Query** Navigating to the main Wikipedia site and executing a search query for "Model Context Protocol," highlighting cross-site interaction. [**01:38**](https://www.youtube.com/watch?v=0lMsKiTy7TE&t=98s) - **Form Element Selection I** Interaction with a test page (the-internet.herokuapp.com/dropdown) to select the "Option 1" item from a selection control. [**01:56**](https://www.youtube.com/watch?v=0lMsKiTy7TE&t=116s) - **Form Element Selection II** Repeating the interaction on the same selection control to choose "Option 2," confirming iterative element manipulation. [**02:09**](https://www.youtube.com/watch?v=0lMsKiTy7TE&t=129s) - **Credential Entry Simulation** Navigating to a simulated login portal (the-internet.herokuapp.com/login) and populating the credentials ("tomsmith" and "SuperSecretPassword!") into their respective input fields. [**02:28**](https://www.youtube.com/watch?v=0lMsKiTy7TE&t=148s) - **Authentication Submission** Executing the form submission action following credential input, completing a typical user authentication sequence. [**02:36**](https://www.youtube.com/watch?v=0lMsKiTy7TE&t=156s) - **Direct API Endpoint Access** Performing a direct GET request against a public JSONPlaceholder endpoint, demonstrating the agent's ability to bypass the browser UI for direct data retrieval.Prerequisites
- Runtime Environment: Node.js version 16 or newer.
- Host Application: Claude Desktop client.
- Dependencies: Necessary underlying dependencies managed by Playwright.
Supported Web Engines
bash npm init playwright@latest
This package initialization installs Playwright and all requisite browser binary drivers. Dependency resolution occurs during npm install. The supported engines are:
- Chromium (the default engine)
- Mozilla Firefox
- WebKit (the engine powering Safari)
- Microsoft Edge
Playwright automatically fetches the required binaries upon the first invocation of a specific browser type. Manual pre-installation is possible via:
npx playwright install chrome npx playwright install firefox npx playwright install webkit npx playwright install msedge
Note on Safari Equivalence: Direct invocation of the Safari application is not supported. Playwright utilizes the WebKit rendering engine, which is the core technology behind Safari, offering highly comparable functionality.
Note on Edge Implementation: When specifying Edge, the agent launches Microsoft Edge. Internally within Playwright, this corresponds to launching a Chromium instance flagged with the 'msedge' channel, reflecting Edge's Chromium base.
Deployment Procedure
Local Manual Setup
- Obtain the source repository (clone or download):
git clone https://github.com/imprvhub/mcp-browser-agent cd mcp-browser-agent
- Install required packages:
npm install
- Compile the source code:
npm run build
Activating the MCP Service Endpoint
There are two primary methods for initiating the MCP service:
Method 1: Direct Terminal Execution
- Open a command-line interface.
- Change directory to the project root.
- Start the service:
node dist/index.js
This process must remain active (the terminal window open) for Claude Desktop to maintain connectivity.
Method 2: Automated Startup via Claude Desktop Configuration (Recommended)
Claude Desktop can be configured to launch this service automatically upon interaction. Locate and modify the configuration file specific to your operating system:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
Insert or update the mcpServers block to include the Browser Agent configuration:
{ "mcpServers": { "browserAgent": { "command": "node", "args": ["ABSOLUTE_PATH_TO_DIRECTORY/mcp-browser-agent/dist/index.js", "--browser", "chrome" ] } } }
Critical Step: Substitute ABSOLUTE_PATH_TO_DIRECTORY with the full, unqualified path to the location where the agent repository resides.
- macOS/Linux Example: /Users/yourname/projects/mcp-browser-agent
- Windows Example: C:\Users\yourname\dev\mcp-browser-agent
If other MCP integrations exist, merge the browserAgent definition into the existing mcpServers map, as shown below:
{ "mcpServers": { "existingService1": { "command": "...", "args": ["..."] }, "browserAgent": { "command": "node", "args": [ "ABSOLUTE_PATH_TO_DIRECTORY/mcp-browser-agent/dist/index.js", "--browser", "chrome" ] } } }
Browser Engine Selection Policy
The agent defaults to the Chromium engine, but the active web engine can be specified via several mechanisms:
Preference 1: Local Configuration File
Create or modify .mcp_browser_agent_config.json within your user's home directory:
{ "browserType": "firefox" }
Acceptable engine identifiers are:
- chrome - Uses the installed Google Chrome instance (default setting).
- firefox - Leverages the Mozilla Firefox browser.
- webkit - Employs the WebKit engine, mirroring Safari's core.
- edge - Targets Microsoft Edge.
Safari Caveat: WebKit is the engine proxy, not the native Safari application interface. Functionality parity is high, but not absolute.
Preference 2: Runtime Command-Line Flag
When launching manually, append the desired engine:
node dist/index.js --browser webkit
Preference 3: Environmental Variable Override
Set the MCP_BROWSER_TYPE before execution:
export MCP_BROWSER_TYPE=edge; node dist/index.js
Preference 4: Claude Desktop Initialization Parameter
Specify the engine within the claude_desktop_config.json startup arguments:
{ "mcpServers": { "browserAgent": { "command": "node", "args": [ "ABSOLUTE_PATH_TO_DIRECTORY/mcp-browser-agent/dist/index.js", "--browser", "firefox" ] } } }
Internal Architecture
The MCP Web Surfer is engineered around the Model Context Protocol, interfacing with a visible browser instance via Playwright. The structure is composed of four primary functional modules:
- Service Core (
index.ts) - Initialization sequence for the MCP service, adhering to protocol specifications.
- Definition and registration of available toolsets and data access points (resources).
-
Management of the inter-process communication channel with Claude via standard input/output.
-
Tool Schema Definition (
tools.ts) - Formal specification of all browser actions and API methods.
- Includes validation schemas, parameter descriptions, and semantic explanations for Claude's discovery phase.
-
Registers these capabilities with the main service endpoint.
-
Request Management (
handlers.ts) - Processes incoming MCP requests directed at tool invocation or resource fetching.
- Handles the retrieval mechanism for log streams and stored visual data.
-
Directs incoming tool execution commands to the dedicated execution layer.
-
Execution Engine (
executor.ts) - Manages the lifecycle of the underlying browser utility and the HTTP client.
- Implements the actual automation logic utilizing the Playwright library calls.
- Executes remote data queries, incorporating necessary data transformation and error management.
- Persists the active browser state between sequential commands.
Agentic Enhancements
Distinct from rudimentary integrations, this component functions as a true cognitive agent by:
- Retaining operational context (browser state) across subsequent instructions.
- Providing access to actionable console output for diagnostic purposes.
- Persisting visual records (screenshots) for later reference or verification.
- Orchestrating complex sequences of operations.
- Delivering precise failure reports to facilitate intelligent workflow recovery.
- Facilitating the execution of compound, multi-step automated procedures.
Registered Functionality (Tools)
Browser Manipulation Functions
| Function Name | Purpose | Required Arguments | Optional Arguments |
|---|---|---|---|
browser_navigate |
Change the active page URL | url |
timeout, waitUntil |
browser_screenshot |
Capture the current display state | name |
selector, fullPage, mask, savePath |
browser_click |
Simulate a mouse click on a target | selector |
None |
browser_fill |
Input text into a form field | selector, value |
None |
browser_select |
Choose an option from a <select> element |
selector, value |
None |
browser_hover |
Move the pointer over an element | selector |
None |
browser_evaluate |
Execute custom JavaScript code | script |
None |
Network Interaction Functions
| Function Name | Purpose | Required Arguments | Optional Arguments |
|---|---|---|---|
api_get |
Perform an HTTP GET operation | url |
headers |
api_post |
Perform an HTTP POST operation with data | url, data |
headers |
api_put |
Perform an HTTP PUT operation with data | url, data |
headers |
api_patch |
Perform an HTTP PATCH operation with data | url, data |
headers |
api_delete |
Perform an HTTP DELETE operation | url |
headers |
Data Access (Resources)
The agent makes the following system artifacts available through the MCP resource path scheme:
browser://logs- Yields the aggregated console output from the active browser session.screenshot://[name]- Retrieves the visual artifact previously saved under the specified unique name.
Practical Usage Scenarios
Below are illustrative examples demonstrating the agent's utility when interacting with Claude:
Fundamental Page Operations
Initiate navigation to the Google primary portal: https://www.google.com
Generate a visual capture of the current page state and designate it as "google-splash-view"
Inject the text "current weather report" into the identified search input element
Contextual Interactions
Route to https://www.wikipedia.org and query the encyclopedia entry for "Model Context Protocol"
Access https://the-internet.herokuapp.com/dropdown and programmatically set the visible value of the selection box to "Option 1"
Data Entry Automation
Navigate to https://the-internet.herokuapp.com/login, populate the username field with "tomsmith" and the corresponding password field with "SuperSecretPassword!"
Proceed to https://the-internet.herokuapp.com/login, input credentials into both fields, and subsequently trigger the authentication submission control
Embedded Script Invocation
Load https://example.com and execute a sandbox script designed to extract and return the document's primary title property
Navigate to https://www.google.com and run a JavaScript snippet to count and report the total number of hyperlinked elements present on the visible surface
Simple Data Retrieval
Execute a GET request targeting the endpoint https://jsonplaceholder.typicode.com/todos/1
Construct and dispatch a POST request to https://jsonplaceholder.typicode.com/posts, including a standardized JSON payload structure
These command examples mirror the actual functional scope achievable by the MCP Web Surfer utility in its current operational state.
Troubleshooting Guide
Issue: Connection Loss Reported as "Server disconnected"
If Claude Desktop reports a failure to communicate with the agent service:
- Service Status Verification:
- Confirm the manual execution (
node dist/index.js) successfully initiated the server in your terminal. -
If successful, ensure that terminal session remains foregrounded.
-
Configuration Path Integrity:
- Scrutinize the path specified within
claude_desktop_config.jsonfor absolute correctness. - On Windows systems, ensure all backslashes are correctly escaped (i.e., use
\\). -
Verify the path originates from the file system root.
-
Restart Sequence:
- Terminate all stray node processes associated with the server.
- Relaunch Claude Desktop to force a clean socket re-establishment.
Issue: Web Browser Fails to Launch
If the browser window does not materialize:
- Engine Installation Check:
- Confirm that the selected browser application (Chrome, Firefox, etc.) is installed on the host system.
-
Playwright generally manages driver installation, but pre-existing environment issues can interfere.
-
System Reboot:
- Attempt a full restart of both the service and the Claude client.
Issue: Lingering Browser Processes Post-Use
Sometimes, especially with Chromium-based engines, the browser child process fails to exit cleanly upon service termination:
- Manual Process Termination:
- Windows: Utilize Task Manager (Ctrl+Shift+Esc) to locate and force-close all related Chrome/Chromium entries.
- macOS: Employ Activity Monitor (located in Utilities) to identify and terminate the rogue process.
-
Linux: Use process inspection tools (
ps aux | grep chrome) to find the Process ID (PID) and apply thekill <PID>command. -
Engine Variation Observation:
- This persistence anomaly is frequently noted with Chrome/Chromium.
- Firefox and WebKit drivers typically exhibit cleaner shutdown behavior.
[!CAUTION] As this integration is fundamentally reliant upon the Playwright library, users may encounter platform-specific bugs inherent to Playwright itself. All such environment-related defects should be directed to the Playwright GitHub repository for resolution. This agent provides the interfacing mechanism, but Playwright handles the underlying browser control.
Development Environment
Project Layout
src/index.ts: Main entry point; initiates and manages the MCP service.src/tools.ts: Defines the structure and metadata for all exposed tool schemas.src/handlers.ts: Logic responsible for processing incoming MCP requests for tools and resources.src/executor.ts: Contains the core implementation of browser actions and API calls using Playwright primitives.
Compilation
npm run build
Continuous Rebuild Mode
npm run watch
Validation Suite
The repository includes unit and integration tests to ensure stability.
mprum test # Execute full test suite mprum test:watch # Run tests reactively upon file changes mprum test:coverage # Generate code coverage report
Tests specifically validate configuration structure, the reliability of browser invocations, operational fidelity under error conditions, and the critical task of ensuring proper process cleanup, especially concerning headless engine termination.
Security Directives
[!IMPORTANT] Granting Claude autonomous control over a web browser carries significant operational risk. Review the comprehensive Security Policy document for mandatory usage guidelines and known security implications.
The Web Surfer tool is engineered for constructive automation. All deployments must strictly adhere to prevailing legal statutes, relevant service agreements, and established ethical parameters for automated interaction. Refer to Security Policy for in-depth compliance details.
Collaborative Contributions
We welcome community input to enhance the MCP Web Surfer:
- Expanding the repertoire of supported browser actions.
- Fortifying error detection and automated recovery strategies.
- Refining the efficiency of screenshot capture and resource retrieval.
- Developing novel, high-value workflow examples.
- Performance tuning for high-volume operational loads.
Licensing
This software is distributed under the terms of the Mozilla Public License, Version 2.0 (MPL 2.0). Consult the accompanying LICENSE file for full legal text.
Related Resources
- Model Context Protocol Specification
- Download Claude Desktop Client
- Official Playwright Documentation
- MCP Project Ecosystem
WIKIPEDIA: A headless browser is a web browser without a graphical user interface. Headless browsers provide automated control of a web page in an environment similar to popular web browsers, but they are executed via a command-line interface or using network communication. They are particularly useful for testing web pages as they are able to render and understand HTML the same way a browser would, including styling elements such as page layout, color, font selection and execution of JavaScript and Ajax which are usually not available when using other testing methods. Since version 59 of Google Chrome and version 56 of Firefox, there is native support for remote control of the browser. This made earlier efforts obsolete, notably PhantomJS.
== Use cases == The main use cases for headless browsers are:
Test automation in modern web applications (web testing) Taking screenshots of web pages. Running automated tests for JavaScript libraries. Automating interaction of web pages.
=== Other uses === Headless browsers are also useful for web scraping. Google stated in 2009 that using a headless browser could help their search engine index content from websites that use Ajax. Headless browsers have also been misused in various ways:
Perform DDoS attacks on web sites. Increase advertisement impressions. Automate web sites in unintended ways e.g. for credential stuffing. However, a study of browser traffic in 2018 found no preference by malicious actors for headless browsers. There is no indication that headless browsers are used more frequently than non-headless browsers for malicious purposes, like DDoS attacks, SQL injections or cross-site scripting attacks.
== Usage == As several major browsers natively support headless mode through APIs, some software exists to perform browser automation through a unified interface. These include:
Selenium WebDriver – a W3C compliant implementation of WebDriver Playwright – a Node.js library to automate Chromium, Firefox and WebKit Puppeteer – a Node.js library to automate Chrome or Firefox
=== Test automation === Some test automation software and frameworks include headless browsers as part of their testing apparati.
Capybara uses headless browsing, either via WebKit or Headless Chrome to mimic user behavior in its testing protocols. Jasmine uses Selenium by default, but can use WebKit or Headless Chrome, to run browser tests. Cypress, a frontend testing framework QF-Test, a software tool for automated testing of programs via the graphical user interface where a headless browser can also be used for testing.
=== Alternatives === Another approach is to use software that provides browser APIs. For example, Deno provides browser APIs as part of its design. For Node.js, jsdom is the most complete provider. While most are able to support common browser features (HTML parsing, cookies, XHR, some JavaScript, etc.), they do not render the DOM and have limited support for DOM events. They usually perform faster than
