logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

mcp-browser-engine

A robust server component for the Model Context Protocol (MCP) that facilitates dynamic web interaction via a non-visual browser environment, supporting content retrieval, interactive form manipulation, and session/tab orchestration.

Author

mcp-browser-engine logo

random-robbie

MIT License

Quick Info

GitHub GitHub Stars 23
NPM Weekly Downloads 0
Tools 1
Last Updated 2026-02-19

Tags

browserautomationwebbrowser automationautomation webadvanced web

MCP Browser Engine Service

This specialized server implements advanced web navigation and manipulation capabilities for the Model Context Protocol (MCP), leveraging the Playwright automation framework to offer a secure, scriptable interface for interacting with live web resources in a headless mode.

Web Browser Server MCP server

🌟 Core Capabilities

  • Automated Headless Navigation: Traverse web pages, including handling sites with compromised or invalid SSL certificates.
  • Complete Document Capture: Obtain the full source code (HTML) of a page, accounting for elements rendered post-JavaScript execution.
  • Parallel Session Management: Capability to spawn, maintain context across, and toggle between multiple independent browser windows (tabs).
  • Intelligent Web Element Manipulation: A comprehensive toolkit for:
  • Extracting textual data from the viewport or specific DOM nodes.
  • Simulating user clicks on designated interactive components.
  • Injecting specified strings into input fields and form controls.
  • Generating visual snapshots (screenshots) of the entire page or targeted regions.
  • Retrieving all hyperlink references, optionally filtered by string criteria.
  • Controlling the viewport scroll position (up, down, left, right).
  • Executing arbitrary JavaScript code within the page context.
  • Re-initiating the page load process.
  • Pausing execution until a page transition concludes.
  • Resource Hygiene: Automated garbage collection for detached browser contexts after periods of inactivity.
  • Contextual Page Metrics: Access to detailed meta-data pertaining to the currently active document.

⚙️ Deployment Guide

Prerequisites

Ensure the following dependencies are satisfied:

  • A system running Python version 3.10 or newer.
  • The core MCP SDK library installed.
  • The Playwright library and its requisite browser binaries.

Installation Steps

bash

Install necessary Python packages

pip install mcp playwright

Download and set up browser binaries (Chromium, Firefox, WebKit)

playwright install

Integration with Claude Desktop Environment

Insert the following configuration block into your claude_desktop_config.json manifest:

{ "mcpServers": { "web-browser": { "command": "python", "args": [ "/path/to/your/server.py" ] } } }

💡 Operational Examples

Initial Web Access

python

Initiate navigation to a target Uniform Resource Locator

page_data = navigate_to_uri("https://www.example.com")

Retrieve all visible text content

plain_text = extract_document_text()

Obtain text specifically from an H1 tag

main_heading = extract_document_text("h1.main-header")

Interactive Form Submission

python

Land on the login portal

navigate_to_uri("https://secure.app/login")

Populate credential fields

populate_field("#user-input", "my_credentials") populate_field("#pass-input", "secret_key")

Trigger the submission sequence

activate_element("#submit-button")

Visual Artifact Generation

python

Produce a screenshot of the entire viewport

full_capture = capture_viewport_image(capture_full_document=True)

Produce a screenshot of a single, targeted component

component_snapshot = capture_viewport_image(target_selector=".data-table")

python

Dump all discovered URLs on the current page

all_urls = enumerate_page_links()

Retrieve URLs containing the word 'support'

relevant_urls = enumerate_page_links(link_substring_match="support")

Multi-Session Handling

python

Open a new browser window/tab pointing to a specific site

session_one_id = spawn_new_session("https://service-a.net")

Open a second, distinct session

session_two_id = spawn_new_session("https://service-b.net")

Inventory of active sessions

active_sessions = list_all_sessions()

Move focus to the first session

focus_session(session_one_id)

Terminate the second session

terminate_session(session_two_id)

Low-Level Page Operations

python

Scroll the view downward by one screen height

adjust_scroll_position(direction="down", magnitude="page")

Execute custom inline script

script_output = run_in_context("return (function() { return 1 + 1; })()")

Fetch current page metadata

metadata = retrieve_page_metadata()

Reload the active page

reload_current_document()

Wait indefinitely until the next navigation event is resolved

await_document_resolution(max_wait_ms=8000)

🔒 Protective Measures

  • Integrated handling for SSL/TLS certificate validity checks.
  • Robust context isolation for secure session operations.
  • Support for overriding the default user agent string.
  • Extensive error reporting and diagnostic logging.
  • Fine-grained control over request timeouts.
  • Mechanisms to manage Content Security Policy (CSP) restrictions.
  • Safeguards against common browser-based data exfiltration vectors.

🛠️ Diagnostics and Support

Known Anomalies

  • TLS Errors: These are automatically suppressed by the engine configuration.
  • Latency in Loading: Fine-tune the duration settings in the navigate_to_uri function.
  • Selector Misalignment: Double-check the Document Object Model (DOM) query syntax.
  • Resource Hogging: The system attempts autonomous cleanup; manual intervention may occasionally be needed.

Operational Logging

Detailed records of all significant service activities are generated for debugging and auditing purposes.

📋 Interface Specifications

  • url: The target web address for loading.
  • context: Reserved for future state persistence (currently inert).

extract_document_text(selector: Optional[str] = None, context: Optional[Any] = None)

  • selector: Optional CSS path to pinpoint content extraction.
  • context: Reserved slot.

activate_element(selector: str, context: Optional[Any] = None)

  • selector: CSS identifier for the component intended for clicking.
  • context: Reserved slot.

capture_viewport_image(capture_full_document: bool = False, target_selector: Optional[str] = None, context: Optional[Any] = None)

  • capture_full_document: Boolean flag to capture beyond the visible screen.
  • target_selector: Specific element selector for a localized image.
  • context: Reserved slot.

enumerate_page_links(link_substring_match: Optional[str] = None, context: Optional[Any] = None)

  • link_substring_match: Optional filter string for returned URLs.
  • context: Reserved slot.

populate_field(selector: str, text: str, context: Optional[Any] = None)

  • selector: CSS identifier for the data entry field.
  • text: The string data to be entered.
  • context: Reserved slot.

spawn_new_session(url: Optional[str] = None, context: Optional[Any] = None)

  • url: Initial page to load in the new session, if any.
  • context: Reserved slot.

focus_session(tab_id: str, context: Optional[Any] = None)

  • tab_id: Identifier designating the session to bring to the foreground.
  • context: Reserved slot.

list_all_sessions(context: Optional[Any] = None)

  • context: Reserved slot.

terminate_session(tab_id: Optional[str] = None, context: Optional[Any] = None)

  • tab_id: Identifier for the session to close (defaults to the currently active one).
  • context: Reserved slot.

reload_current_document(context: Optional[Any] = None)

  • context: Reserved slot.

retrieve_page_metadata(context: Optional[Any] = None)

  • context: Reserved slot.

adjust_scroll_position(direction: str = "down", magnitude: str = "page", context: Optional[Any] = None)

  • direction: Scroll axis ('up', 'down', 'left', 'right').
  • magnitude: Scroll distance ('page', 'half', or pixel value).
  • context: Reserved slot.

await_document_resolution(max_wait_ms: int = 10000, context: Optional[Any] = None)

  • max_wait_ms: Upper boundary for waiting time, in milliseconds.
  • context: Reserved slot.

run_in_context(script: str, context: Optional[Any] = None)

  • script: The JavaScript payload to be executed.
  • context: Reserved slot.

🤝 Collaboration Model

We welcome external enhancements! Kindly submit a request for changes via a Pull Request.

Development Environment Setup

bash

Clone the source repository

git clone https://github.com/random-robbie/mcp-web-browser.git

Establish isolated environment

python -m venv venv . venv/bin/activate # Use venv\Scripts\activate on Windows OS

Install primary and development dependencies

pip install -e .[dev]

Licensed under the MIT Agreement.

🔗 Associated Resources

💬 Support Channel

For all reported bugs or technical inquiries, please utilize the GitHub Issue Tracker: Open an Issue

BACKGROUND ON HEADLESS BROWSING: A headless browser operates without a visible graphical user interface. It facilitates programmatic control over web pages, mimicking standard browser behavior—including DOM rendering, CSS application, and JavaScript execution—but executes through command-line or network interfaces. This utility is crucial for automated quality assurance (QA) processes, capturing high-fidelity page snapshots, and scripted web interaction. Modern browser engines (Chrome, Firefox) now natively support headless operations via remote control APIs, superseding older standalone solutions like PhantomJS. Key applications include web testing, automated visual capture, and scripting complex user flows. While valuable for legitimate purposes like web scraping (e.g., indexing AJAX-heavy sites), headless tools can be misused for activities such as generating artificial traffic or unauthorized automation. However, traffic analysis suggests headless usage correlates poorly with identifiable malicious patterns compared to standard browser usage.

AUTOMATION FRAMEWORKS: Several established tools standardize the headless control experience: Selenium WebDriver (W3C standard), Playwright (multi-engine support), and Puppeteer (Chromium/Firefox focused). Testing frameworks like Capybara, Jasmine, and Cypress often incorporate these headless capabilities into their test execution pipelines. Alternatives to full browser emulation include DOM parsing libraries like jsdom (Node.js) or Deno's built-in browser APIs, which are faster but lack true visual rendering and full event simulation.

See Also

`