AI-Driven Chrome Controller Protocol (MCP)

This communication protocol empowers sophisticated Artificial Intelligence agents to command and orchestrate the Chrome web browser for complex web automation tasks.

Deployment Instructions

Essential Prerequisites

Runtime environment: Python version 3.12 or newer
Companion software: Required Google Chrome extension (websocket communication client) must be installed
Dependency manager: Utilize 'uv' for Python package handling, or alternatively, leverage Docker containers

Installation via Smithery Platform

To automatically deploy the Chrome Web Automation Server component using Smithery for Claude Desktop clients:

npx -y @smithery/cli install @dlwjdtn535/mcp-chrome-integration --client claude

Configuration Setup

Select the appropriate configuration structure for your operating system environment:

1. Recommended Method: Using uv

Windows Initialization JSON:

{
  "mcpServers": {
    "ai-driven-chrome-controller": {
      "command": "uv",
      "args": [
        "run",
        "--directory",
        "%LOCALAPPDATA%\\Programs\\mcp-chrome-integration\\src",
        "mcp-server"
      ],
      "env": {
        "WEBSOCKET_PORT": "8012"
      }
    }
  }
}

macOS Initialization JSON:

{
  "mcpServers": {
    "ai-driven-chrome-controller": {
      "command": "uv",
      "args": [
        "run",
        "--directory",
        "/usr/local/bin/mcp-chrome-integration/src",
        "mcp-server"
      ],
      "env": {
        "WEBSOCKET_PORT": "8012"
      }
    }
  }
}

Linux Initialization JSON:

{
  "mcpServers": {
    "ai-driven-chrome-controller": {
      "command": "uv",
      "args": [
        "run",
        "--directory",
        "/usr/local/bin/mcp-chrome-integration/src",
        "mcp-server"
      ],
      "env": {
        "WEBSOCKET_PORT": "8012"
      }
    }
  }
}

Core Capabilities

Direct URL traversal
Simulated element activation (clicks)
Text entry into form fields
Complete form submission
Viewport scrolling control
Structured data extraction from tabular elements
In-browser JavaScript code execution

2. Element Inspection and Modification

Query element metadata (spatial dimensions, applied styles, visibility status)
Implement waiting mechanisms until specific DOM elements appear
Dynamically alter element background colors
Retrieve current page structure and status snapshot

3. Content Auditing and Parsing

Fetch complete Document Object Model (DOM) source code
Tally all hyperlinks present
Extract metadata tags
Obtain details associated with image assets
Analyze and parse form structures
Stream page content incrementally

4. Browser Feature Access

Management and modification of browser bookmarks
Accessing and querying browsing history logs
Control over file download processes
Displaying system notifications
Interaction with the system clipboard
Handling browser cookies

5. System Interaction

Querying host system information
Simulating geolocation data
Monitoring device power status/battery levels
Capturing high-resolution screen images

Operational Examples (Python)

# Initiate navigation
tool_navigate_to(url="https://example.com", tab_id="active_session_id")

# Activate an element via CSS selector
tool_click_element(selector="#submit-button", tab_id="active_session_id")

# Inject text
tool_type_text(selector="#search", text="search query", tab_id="active_session_id")

# Snapshot current page condition
tool_state(tab_id="active_session_id")

# Run arbitrary JavaScript
tool_execute_script(script="console.log('Process initiated')", tab_id="active_session_id")

# Pull data from a table element
tool_extract_table(selector=".data-table", tab_id="active_session_id")

# Inspect element properties
tool_get_element_info(selector=".my-element", tab_id="active_session_id")

Critical Operational Considerations

1. Chrome Security Constraints

Operations are blocked on internal chrome:// uniform resource identifiers (URIs)
Functionality is restricted to standard web content (http:// or https:// protocols)
Certain website Content Security Policies (CSPs) might prevent script execution
Always factor in the target website's CSP during JavaScript operations

2. Tab Context Management

A unique tab_id is mandatory for every command execution
Use the tool_tab_list() function to discover active tabs
Verify the target tab's readiness status prior to executing actions

3. Failure Mitigation

Systematically inspect function return values for success indicators
Implement timeout handling logic for element waiting operations
Account for potential latency related to page loading completion

Deployment and Setup Procedures

1. Companion Extension Setup

Prepare Client Artifacts bash # Navigate to the client extension directory cd mcp-client
Install Extension in Chrome
Launch the Google Chrome browser
Navigate to the address: chrome://extensions/
Activate the 'Developer mode' toggle, typically located in the upper right corner
Select the 'Load unpacked' button (upper left)
Point this action to the mcp-client folder location
Extension Parameter Configuration
Click the newly installed MCP extension icon within the Chrome toolbar
Input the server endpoint address (default is ws://localhost:8012)
Click the 'Connect' confirmation button
Confirm the connection status indicator changes to "Connected"
Ongoing Usage Protocols
The controller functions automatically within connected browsing sessions
For newly opened tabs, manually access the extension icon and establish the websocket link
Monitor operational messages in the extension's dedicated log view
Use 'Disconnect' to terminate the active communication channel

2. Backend Server Initialization

# Move into the server application directory
cd mcp-server

# Install required server dependencies
pip install -r requirements.txt

# Initiate the server process
python src/server.py

Extension Operational Features

1. User Interface Overlay

Field for setting the communication server address
Toggle for establishing/breaking the websocket link
Display showing the current tab's operational status
Integrated viewer for real-time log outputs

2. Background Processes

Management of all active browsing contexts (tabs)
Maintenance of the persistent WebSocket link
Automatic reconnection attempts upon link failure
Mechanisms for rudimentary error recovery

3. Security Mechanisms

Capability to support HTTPS connections
Adherence to Content Security Policies
Safeguards for executing embedded scripts
System for handling necessary operational permissions

4. Diagnostics and Tracing

Detailed log visibility within Chrome Developer Tools
Provision of descriptive error reporting
Tools for monitoring network traffic exchange
Tracking mechanism for execution state transitions

Common Issue Resolution

1. Connectivity Problems

Double-check the specified server endpoint URL
Ensure the backend server application is operational
Investigate local firewall exceptions
Confirm that the designated WebSocket port (8012) is not occupied

2. Execution Failures

Review content security policy blockages
Verify that necessary operational permissions have been granted
Examine console output for JavaScript errors
Validate that the provided Tab Identifier is correct

3. Performance Degradation

Monitor the system's memory utilization
Terminate connections for non-utilized browser sessions
Tweak the frequency of status updates transmitted by the server
Optimize routines for handling extremely large data payloads

Licensing

Distributed under the MIT License

WIKIPEDIA CONTEXT: A headless browser operates without a visible graphical user interface. These tools allow for scripted manipulation of a web page within an environment functionally equivalent to a standard browser, executed via command-line or network interface. They are invaluable for quality assurance, as they render pages identically to a visual browser, executing all CSS, JavaScript, and Ajax. Native remote control capabilities in modern browsers (Chrome 59+, Firefox 56+) have superseded older tools like PhantomJS.

== Primary Applications == The principal uses for headless browsing include:

Automated testing of contemporary web applications.
Generating static images (screenshots) of web content.
Executing automated test suites for JavaScript frameworks.
Programmatic interaction with web document interfaces.

=== Secondary Applications === Headless environments are also leveraged for web data acquisition (scraping). Google noted in 2009 that using them aids in indexing content relying on Ajax. Conversely, they face misuse potential, such as facilitating distributed denial-of-service (DDoS) attacks, inflating ad impressions, or automating unintended site interactions (e.g., brute-forcing credentials). However, a 2018 traffic analysis indicated no disproportionate preference for headless browsers among malicious actors over standard browsers for activities like DDoS, SQL injection, or XSS attacks.

== Implementation Landscape == Because major browser engines now natively expose headless mode via APIs, several software frameworks offer unified control interfaces:

Selenium WebDriver – Compliant with W3C WebDriver specifications.
Playwright – A library supporting automation across Chromium, Firefox, and WebKit.
Puppeteer – A Node.js utility for controlling Chrome or Firefox.

=== Test Automation Frameworks === Several testing suites integrate headless browsers into their execution apparatus:

Capybara utilizes either Headless Chrome or WebKit to simulate user actions.
Jasmine defaults to Selenium but can be configured for WebKit or Headless Chrome.
Cypress, a dedicated frontend testing framework.
QF-Test, a GUI-based testing tool supporting headless modes.

=== Alternative Approaches === An alternative path involves utilizing software that exposes browser-like APIs directly. For instance, Deno integrates browser APIs natively. For Node.js environments, jsdom is the most comprehensive provider. While these alternatives generally support core browser functions (HTML parsing, cookies, XHR, partial JavaScript), they typically lack actual DOM rendering and have limited event handling, often resulting in faster execution than full browser emulation.

ai-driven-chrome-controller

Author

dlwjdtn535

Quick Info

Actions

Tags