mcp-browser-engine
A robust server component for the Model Context Protocol (MCP) that facilitates dynamic web interaction via a non-visual browser environment, supporting content retrieval, interactive form manipulation, and session/tab orchestration.
Author

random-robbie
Quick Info
Actions
Tags
MCP Browser Engine Service
This specialized server implements advanced web navigation and manipulation capabilities for the Model Context Protocol (MCP), leveraging the Playwright automation framework to offer a secure, scriptable interface for interacting with live web resources in a headless mode.
🌟 Core Capabilities
- Automated Headless Navigation: Traverse web pages, including handling sites with compromised or invalid SSL certificates.
- Complete Document Capture: Obtain the full source code (HTML) of a page, accounting for elements rendered post-JavaScript execution.
- Parallel Session Management: Capability to spawn, maintain context across, and toggle between multiple independent browser windows (tabs).
- Intelligent Web Element Manipulation: A comprehensive toolkit for:
- Extracting textual data from the viewport or specific DOM nodes.
- Simulating user clicks on designated interactive components.
- Injecting specified strings into input fields and form controls.
- Generating visual snapshots (screenshots) of the entire page or targeted regions.
- Retrieving all hyperlink references, optionally filtered by string criteria.
- Controlling the viewport scroll position (up, down, left, right).
- Executing arbitrary JavaScript code within the page context.
- Re-initiating the page load process.
- Pausing execution until a page transition concludes.
- Resource Hygiene: Automated garbage collection for detached browser contexts after periods of inactivity.
- Contextual Page Metrics: Access to detailed meta-data pertaining to the currently active document.
⚙️ Deployment Guide
Prerequisites
Ensure the following dependencies are satisfied:
- A system running Python version 3.10 or newer.
- The core MCP SDK library installed.
- The Playwright library and its requisite browser binaries.
Installation Steps
bash
Install necessary Python packages
pip install mcp playwright
Download and set up browser binaries (Chromium, Firefox, WebKit)
playwright install
Integration with Claude Desktop Environment
Insert the following configuration block into your claude_desktop_config.json manifest:
{ "mcpServers": { "web-browser": { "command": "python", "args": [ "/path/to/your/server.py" ] } } }
💡 Operational Examples
Initial Web Access
python
Initiate navigation to a target Uniform Resource Locator
page_data = navigate_to_uri("https://www.example.com")
Retrieve all visible text content
plain_text = extract_document_text()
Obtain text specifically from an H1 tag
main_heading = extract_document_text("h1.main-header")
Interactive Form Submission
python
Land on the login portal
navigate_to_uri("https://secure.app/login")
Populate credential fields
populate_field("#user-input", "my_credentials") populate_field("#pass-input", "secret_key")
Trigger the submission sequence
activate_element("#submit-button")
Visual Artifact Generation
python
Produce a screenshot of the entire viewport
full_capture = capture_viewport_image(capture_full_document=True)
Produce a screenshot of a single, targeted component
component_snapshot = capture_viewport_image(target_selector=".data-table")
Hyperlink Enumeration
python
Dump all discovered URLs on the current page
all_urls = enumerate_page_links()
Retrieve URLs containing the word 'support'
relevant_urls = enumerate_page_links(link_substring_match="support")
Multi-Session Handling
python
Open a new browser window/tab pointing to a specific site
session_one_id = spawn_new_session("https://service-a.net")
Open a second, distinct session
session_two_id = spawn_new_session("https://service-b.net")
Inventory of active sessions
active_sessions = list_all_sessions()
Move focus to the first session
focus_session(session_one_id)
Terminate the second session
terminate_session(session_two_id)
Low-Level Page Operations
python
Scroll the view downward by one screen height
adjust_scroll_position(direction="down", magnitude="page")
Execute custom inline script
script_output = run_in_context("return (function() { return 1 + 1; })()")
Fetch current page metadata
metadata = retrieve_page_metadata()
Reload the active page
reload_current_document()
Wait indefinitely until the next navigation event is resolved
await_document_resolution(max_wait_ms=8000)
🔒 Protective Measures
- Integrated handling for SSL/TLS certificate validity checks.
- Robust context isolation for secure session operations.
- Support for overriding the default user agent string.
- Extensive error reporting and diagnostic logging.
- Fine-grained control over request timeouts.
- Mechanisms to manage Content Security Policy (CSP) restrictions.
- Safeguards against common browser-based data exfiltration vectors.
🛠️ Diagnostics and Support
Known Anomalies
- TLS Errors: These are automatically suppressed by the engine configuration.
- Latency in Loading: Fine-tune the duration settings in the
navigate_to_urifunction. - Selector Misalignment: Double-check the Document Object Model (DOM) query syntax.
- Resource Hogging: The system attempts autonomous cleanup; manual intervention may occasionally be needed.
Operational Logging
Detailed records of all significant service activities are generated for debugging and auditing purposes.
📋 Interface Specifications
navigate_to_uri(url: str, context: Optional[Any] = None)
url: The target web address for loading.context: Reserved for future state persistence (currently inert).
extract_document_text(selector: Optional[str] = None, context: Optional[Any] = None)
selector: Optional CSS path to pinpoint content extraction.context: Reserved slot.
activate_element(selector: str, context: Optional[Any] = None)
selector: CSS identifier for the component intended for clicking.context: Reserved slot.
capture_viewport_image(capture_full_document: bool = False, target_selector: Optional[str] = None, context: Optional[Any] = None)
capture_full_document: Boolean flag to capture beyond the visible screen.target_selector: Specific element selector for a localized image.context: Reserved slot.
enumerate_page_links(link_substring_match: Optional[str] = None, context: Optional[Any] = None)
link_substring_match: Optional filter string for returned URLs.context: Reserved slot.
populate_field(selector: str, text: str, context: Optional[Any] = None)
selector: CSS identifier for the data entry field.text: The string data to be entered.context: Reserved slot.
spawn_new_session(url: Optional[str] = None, context: Optional[Any] = None)
url: Initial page to load in the new session, if any.context: Reserved slot.
focus_session(tab_id: str, context: Optional[Any] = None)
tab_id: Identifier designating the session to bring to the foreground.context: Reserved slot.
list_all_sessions(context: Optional[Any] = None)
context: Reserved slot.
terminate_session(tab_id: Optional[str] = None, context: Optional[Any] = None)
tab_id: Identifier for the session to close (defaults to the currently active one).context: Reserved slot.
reload_current_document(context: Optional[Any] = None)
context: Reserved slot.
retrieve_page_metadata(context: Optional[Any] = None)
context: Reserved slot.
adjust_scroll_position(direction: str = "down", magnitude: str = "page", context: Optional[Any] = None)
direction: Scroll axis ('up', 'down', 'left', 'right').magnitude: Scroll distance ('page', 'half', or pixel value).context: Reserved slot.
await_document_resolution(max_wait_ms: int = 10000, context: Optional[Any] = None)
max_wait_ms: Upper boundary for waiting time, in milliseconds.context: Reserved slot.
run_in_context(script: str, context: Optional[Any] = None)
script: The JavaScript payload to be executed.context: Reserved slot.
🤝 Collaboration Model
We welcome external enhancements! Kindly submit a request for changes via a Pull Request.
Development Environment Setup
bash
Clone the source repository
git clone https://github.com/random-robbie/mcp-web-browser.git
Establish isolated environment
python -m venv venv
. venv/bin/activate # Use venv\Scripts\activate on Windows OS
Install primary and development dependencies
pip install -e .[dev]
📄 Legal Framework
Licensed under the MIT Agreement.
🔗 Associated Resources
💬 Support Channel
For all reported bugs or technical inquiries, please utilize the GitHub Issue Tracker: Open an Issue
BACKGROUND ON HEADLESS BROWSING: A headless browser operates without a visible graphical user interface. It facilitates programmatic control over web pages, mimicking standard browser behavior—including DOM rendering, CSS application, and JavaScript execution—but executes through command-line or network interfaces. This utility is crucial for automated quality assurance (QA) processes, capturing high-fidelity page snapshots, and scripted web interaction. Modern browser engines (Chrome, Firefox) now natively support headless operations via remote control APIs, superseding older standalone solutions like PhantomJS. Key applications include web testing, automated visual capture, and scripting complex user flows. While valuable for legitimate purposes like web scraping (e.g., indexing AJAX-heavy sites), headless tools can be misused for activities such as generating artificial traffic or unauthorized automation. However, traffic analysis suggests headless usage correlates poorly with identifiable malicious patterns compared to standard browser usage.
AUTOMATION FRAMEWORKS: Several established tools standardize the headless control experience: Selenium WebDriver (W3C standard), Playwright (multi-engine support), and Puppeteer (Chromium/Firefox focused). Testing frameworks like Capybara, Jasmine, and Cypress often incorporate these headless capabilities into their test execution pipelines. Alternatives to full browser emulation include DOM parsing libraries like jsdom (Node.js) or Deno's built-in browser APIs, which are faster but lack true visual rendering and full event simulation.
