Chromium DevTools Protocol Management Server

This MCP server exposes functionality to interface directly with a running Google Chrome instance using its native Remote Debugging Protocol (CDP). It serves as a robust bridge for advanced browser manipulation tasks.

Rationale for Use

This utility is essential when precise, programmatic configuration of a browser environment is required prior to automated interactions (e.g., by an AI agent like Cline). Furthermore, it allows for deep introspection, enabling the capture and contextualization of detailed network transaction data.

Core Capabilities

Enumeration of active browser sessions (tabs).
Remote execution of arbitrary JavaScript within a target context.
Generation of visual snapshots from specified tabs.
Real-time monitoring and logging of network activities.
Direct navigation commands for specific tabs.
Inspection and retrieval of Document Object Model (DOM) structure based on CSS selectors.
Simulated user interaction (clicks) with associated console feedback logging.

Deployment Prerequisites

Installation via Node Package Manager is required:

bash npm install @nicholmikey/chrome-tools

Configuration Schema

The server instance is configured via environment variables defined in the MCP manifest:

{ "chromium-tools-interface": { "command": "node", "args": ["path/to/chrome-tools/dist/index.js"], "env": { "CDP_ENDPOINT_URL": "http://localhost:9222", "LINKAGE_MODE": "direct", "FAULT_RECOVERY_HINT": "custom diagnostic message" } } }

Environmental Parameters

CDP_ENDPOINT_URL: Specifies the network location where Chrome's remote debugging service is accessible (Default: http://localhost:9222).
LINKAGE_MODE: Categorical identifier for the connection methodology used (e.g., "direct", "ssh-tunnel", "docker").
FAULT_RECOVERY_HINT: Customized advisory text presented upon connection failure.

Operational Setup Guide

Standard Local Initialization (Windows/Mac/Linux)

Initiate Chrome with the remote debugging flag enabled: bash # Example for Linux google-chrome --remote-debugging-port=9222
Update MCP configuration with connection details:

{ "env": { "CDP_ENDPOINT_URL": "http://localhost:9222", "LINKAGE_MODE": "direct" } }

WSL Interoperability

Requires an intermediary SSH forward to bridge the WSL environment to the host Windows Chrome instance.

Ensure Chrome is running remotely (on Windows) with debugging active.
Establish the port forwarding tunnel: bash ssh -N -L 9222:localhost:9222 windowsuser@host
Configure MCP to reflect the tunneled connection:

{ "env": { "CDP_ENDPOINT_URL": "http://localhost:9222", "LINKAGE_MODE": "ssh-tunnel", "FAULT_RECOVERY_HINT": "Verify the SSH port forwarding session is active: ssh -N -L 9222:localhost:9222 windowsuser@host" } }

Containerized Execution (Docker)

If Chrome is provisioned within a container:

Start the necessary container (using headless shell as an example): bash docker run -d --name chrome_instance -p 9222:9222 chromedp/headless-shell
Configure MCP for container linkage:

{ "env": { "CDP_ENDPOINT_URL": "http://localhost:9222", "LINKAGE_MODE": "docker" } }

Exposed Toolset

list_sessions

Retrieves a manifest of all currently open browser sessions.

inject_script

Executes specified JavaScript code block within the context of a designated session. Parameters: - sessionId: Identifier of the target Chrome session. - script_payload: The string containing JavaScript code.

render_snapshot

Generates a high-fidelity image of the current tab view, applying post-capture optimization tailored for visual processing models. Parameters: - sessionId: Identifier of the target Chrome session. - encoding: Preferred output format (e.g., jpeg/png) - Note: Final processed output favors WebP. - compression_level: Quality metric (1-100) applicable only to initial JPEG encoding. - full_viewport: Flag to capture the entire scrollable document extent.

Image Post-Processing Pipeline: 1. Primary Target (WebP): Attempts encoding at quality 80/high effort. Reverts to quality 60/near-lossless if artifact size exceeds 1MB. 2. Secondary Fallback (PNG): Utilized exclusively if WebP encoding fails; applies maximum lossless compression and palette reduction. 3. Dimensional Constraints: Resulting dimensions are clamped to a maximum of 900x600 (maintaining aspect ratio). File size capped at 1MB, with aggressive downscaling if necessary.

monitor_network

Initiates interception and logging of network requests and responses for a session over a defined time. Parameters: - sessionId: Identifier of the target Chrome session. - capture_time_seconds: Duration for data collection. - criteria: Optional JSON object for filtering by resource type or URL patterns.

navigate_to_resource

Instructs a session to load a new Uniform Resource Locator (URL). Parameters: - sessionId: Identifier of the target Chrome session. - target_url: The destination URI.

query_dom_structure

Searches the Document Object Model using a CSS selector and returns detailed node metadata. Parameters: - sessionId: Identifier of the target Chrome session. - selector: The CSS path used for element matching. Returns: - An array of structured DOM node representations, including: - nodeId: Unique internal node reference. - tagName: The HTML element tag. - textContent: Extracted text content. - attributes: A map of all element properties. - boundingBox: Coordinates and dimensions on the viewport. - isVisible: Boolean indicating rendered visibility. - ariaAttributes: Accessibility properties pertinent to the node.

activate_element

Simulates a mouse click on an element identified by its selector, capturing any synchronous console logging generated during the interaction. Parameters: - sessionId: Identifier of the target Chrome session. - selector: The CSS path identifying the target element. Returns: - A status object containing: - status_message: Confirmation of success or failure. - consoleOutput: Collected messages from the developer console during the click event.

Licensing

Distributed under the terms of the MIT License.

WIKIPEDIA: A headless browser operates without a visible graphical user interface. This mode facilitates automated control over web pages through command-line interfaces or network endpoints, mirroring the rendering capabilities of standard browsers (layout, styling, JavaScript execution). Native support for remote browser control was introduced in Chrome 59 and Firefox 56, superseding earlier methodologies like PhantomJS.

== Principal Applications == The primary utility domains for headless execution include:

Automated testing workflows for contemporary web architectures.
Programmatic generation of web page snapshots.
Execution of unit tests for JavaScript libraries requiring DOM rendering.
Automated interaction sequences on web interfaces.

=== Supplementary Uses === Headless environments are also valuable for efficient web data harvesting. Google acknowledged in 2009 that these tools aid in indexing content heavily reliant on Ajax. Conversely, they present risks such as enabling distributed denial-of-service (DDoS) campaigns, inflating impression counts, or unauthorized site interaction (e.g., credential stuffing). However, traffic analyses from 2018 suggest malicious actors do not disproportionately favor headless systems over traditional browsers for such activities.

== Automation Frameworks == Due to native API support across major browsers, several unified interfaces streamline browser automation:

Selenium WebDriver: Adheres to W3C WebDriver standards.
Playwright: Node.js library supporting Chromium, Firefox, and WebKit.
Puppeteer: Node.js library dedicated to controlling Chrome/Firefox.

=== Testing Integration === Many testing infrastructures incorporate headless capabilities:

Capybara: Leverages Headless Chrome or WebKit for behavior simulation.
Jasmine: Defaults to Selenium but permits configuration for Headless Chrome.
Cypress: A dedicated frontend testing framework.
QF-Test: GUI testing software employing headless capability.

=== Alternative Abstractions === An alternative strategy involves utilizing software that furnishes browser-like APIs without full rendering. Examples include Deno's integrated browser APIs or jsdom for Node.js. While these often support core features (parsing, XHR), their DOM rendering fidelity and event handling are typically more constrained and faster than full browser instances.