ai-driven-chrome-controller
Interface for programmatically directing the Google Chrome environment for intricate web automation workflows, encompassing navigation commands, interactive element manipulation, and comprehensive page content retrieval, alongside management of user settings like history logs and stored bookmarks. Facilitates secure script execution and extraction of granular page data to augment automation pipelines.
Author

dlwjdtn535
Quick Info
Actions
Tags
AI-Driven Chrome Controller Protocol (MCP)
This communication protocol empowers sophisticated Artificial Intelligence agents to command and orchestrate the Chrome web browser for complex web automation tasks.
Deployment Instructions
Essential Prerequisites
- Runtime environment: Python version 3.12 or newer
- Companion software: Required Google Chrome extension (websocket communication client) must be installed
- Dependency manager: Utilize 'uv' for Python package handling, or alternatively, leverage Docker containers
Installation via Smithery Platform
To automatically deploy the Chrome Web Automation Server component using Smithery for Claude Desktop clients:
npx -y @smithery/cli install @dlwjdtn535/mcp-chrome-integration --client claude
Configuration Setup
Select the appropriate configuration structure for your operating system environment:
1. Recommended Method: Using uv
Windows Initialization JSON:
{
"mcpServers": {
"ai-driven-chrome-controller": {
"command": "uv",
"args": [
"run",
"--directory",
"%LOCALAPPDATA%\\Programs\\mcp-chrome-integration\\src",
"mcp-server"
],
"env": {
"WEBSOCKET_PORT": "8012"
}
}
}
}
macOS Initialization JSON:
{
"mcpServers": {
"ai-driven-chrome-controller": {
"command": "uv",
"args": [
"run",
"--directory",
"/usr/local/bin/mcp-chrome-integration/src",
"mcp-server"
],
"env": {
"WEBSOCKET_PORT": "8012"
}
}
}
}
Linux Initialization JSON:
{
"mcpServers": {
"ai-driven-chrome-controller": {
"command": "uv",
"args": [
"run",
"--directory",
"/usr/local/bin/mcp-chrome-integration/src",
"mcp-server"
],
"env": {
"WEBSOCKET_PORT": "8012"
}
}
}
}
Core Capabilities
1. Web Page Navigation & User Input
- Direct URL traversal
- Simulated element activation (clicks)
- Text entry into form fields
- Complete form submission
- Viewport scrolling control
- Structured data extraction from tabular elements
- In-browser JavaScript code execution
2. Element Inspection and Modification
- Query element metadata (spatial dimensions, applied styles, visibility status)
- Implement waiting mechanisms until specific DOM elements appear
- Dynamically alter element background colors
- Retrieve current page structure and status snapshot
3. Content Auditing and Parsing
- Fetch complete Document Object Model (DOM) source code
- Tally all hyperlinks present
- Extract metadata tags
- Obtain details associated with image assets
- Analyze and parse form structures
- Stream page content incrementally
4. Browser Feature Access
- Management and modification of browser bookmarks
- Accessing and querying browsing history logs
- Control over file download processes
- Displaying system notifications
- Interaction with the system clipboard
- Handling browser cookies
5. System Interaction
- Querying host system information
- Simulating geolocation data
- Monitoring device power status/battery levels
- Capturing high-resolution screen images
Operational Examples (Python)
# Initiate navigation
tool_navigate_to(url="https://example.com", tab_id="active_session_id")
# Activate an element via CSS selector
tool_click_element(selector="#submit-button", tab_id="active_session_id")
# Inject text
tool_type_text(selector="#search", text="search query", tab_id="active_session_id")
# Snapshot current page condition
tool_state(tab_id="active_session_id")
# Run arbitrary JavaScript
tool_execute_script(script="console.log('Process initiated')", tab_id="active_session_id")
# Pull data from a table element
tool_extract_table(selector=".data-table", tab_id="active_session_id")
# Inspect element properties
tool_get_element_info(selector=".my-element", tab_id="active_session_id")
Critical Operational Considerations
1. Chrome Security Constraints
- Operations are blocked on internal
chrome://uniform resource identifiers (URIs) - Functionality is restricted to standard web content (
http://orhttps://protocols) - Certain website Content Security Policies (CSPs) might prevent script execution
- Always factor in the target website's CSP during JavaScript operations
2. Tab Context Management
- A unique
tab_idis mandatory for every command execution - Use the
tool_tab_list()function to discover active tabs - Verify the target tab's readiness status prior to executing actions
3. Failure Mitigation
- Systematically inspect function return values for success indicators
- Implement timeout handling logic for element waiting operations
- Account for potential latency related to page loading completion
Deployment and Setup Procedures
1. Companion Extension Setup
-
Prepare Client Artifacts
bash # Navigate to the client extension directory cd mcp-client -
Install Extension in Chrome
- Launch the Google Chrome browser
- Navigate to the address:
chrome://extensions/ - Activate the 'Developer mode' toggle, typically located in the upper right corner
- Select the 'Load unpacked' button (upper left)
-
Point this action to the
mcp-clientfolder location -
Extension Parameter Configuration
- Click the newly installed MCP extension icon within the Chrome toolbar
- Input the server endpoint address (default is
ws://localhost:8012) - Click the 'Connect' confirmation button
-
Confirm the connection status indicator changes to "Connected"
-
Ongoing Usage Protocols
- The controller functions automatically within connected browsing sessions
- For newly opened tabs, manually access the extension icon and establish the websocket link
- Monitor operational messages in the extension's dedicated log view
- Use 'Disconnect' to terminate the active communication channel
2. Backend Server Initialization
# Move into the server application directory
cd mcp-server
# Install required server dependencies
pip install -r requirements.txt
# Initiate the server process
python src/server.py
Extension Operational Features
1. User Interface Overlay
- Field for setting the communication server address
- Toggle for establishing/breaking the websocket link
- Display showing the current tab's operational status
- Integrated viewer for real-time log outputs
2. Background Processes
- Management of all active browsing contexts (tabs)
- Maintenance of the persistent WebSocket link
- Automatic reconnection attempts upon link failure
- Mechanisms for rudimentary error recovery
3. Security Mechanisms
- Capability to support HTTPS connections
- Adherence to Content Security Policies
- Safeguards for executing embedded scripts
- System for handling necessary operational permissions
4. Diagnostics and Tracing
- Detailed log visibility within Chrome Developer Tools
- Provision of descriptive error reporting
- Tools for monitoring network traffic exchange
- Tracking mechanism for execution state transitions
Common Issue Resolution
1. Connectivity Problems
- Double-check the specified server endpoint URL
- Ensure the backend server application is operational
- Investigate local firewall exceptions
- Confirm that the designated WebSocket port (8012) is not occupied
2. Execution Failures
- Review content security policy blockages
- Verify that necessary operational permissions have been granted
- Examine console output for JavaScript errors
- Validate that the provided Tab Identifier is correct
3. Performance Degradation
- Monitor the system's memory utilization
- Terminate connections for non-utilized browser sessions
- Tweak the frequency of status updates transmitted by the server
- Optimize routines for handling extremely large data payloads
Licensing
Distributed under the MIT License
WIKIPEDIA CONTEXT: A headless browser operates without a visible graphical user interface. These tools allow for scripted manipulation of a web page within an environment functionally equivalent to a standard browser, executed via command-line or network interface. They are invaluable for quality assurance, as they render pages identically to a visual browser, executing all CSS, JavaScript, and Ajax. Native remote control capabilities in modern browsers (Chrome 59+, Firefox 56+) have superseded older tools like PhantomJS.
== Primary Applications == The principal uses for headless browsing include:
- Automated testing of contemporary web applications.
- Generating static images (screenshots) of web content.
- Executing automated test suites for JavaScript frameworks.
- Programmatic interaction with web document interfaces.
=== Secondary Applications === Headless environments are also leveraged for web data acquisition (scraping). Google noted in 2009 that using them aids in indexing content relying on Ajax. Conversely, they face misuse potential, such as facilitating distributed denial-of-service (DDoS) attacks, inflating ad impressions, or automating unintended site interactions (e.g., brute-forcing credentials). However, a 2018 traffic analysis indicated no disproportionate preference for headless browsers among malicious actors over standard browsers for activities like DDoS, SQL injection, or XSS attacks.
== Implementation Landscape == Because major browser engines now natively expose headless mode via APIs, several software frameworks offer unified control interfaces:
- Selenium WebDriver – Compliant with W3C WebDriver specifications.
- Playwright – A library supporting automation across Chromium, Firefox, and WebKit.
- Puppeteer – A Node.js utility for controlling Chrome or Firefox.
=== Test Automation Frameworks === Several testing suites integrate headless browsers into their execution apparatus:
- Capybara utilizes either Headless Chrome or WebKit to simulate user actions.
- Jasmine defaults to Selenium but can be configured for WebKit or Headless Chrome.
- Cypress, a dedicated frontend testing framework.
- QF-Test, a GUI-based testing tool supporting headless modes.
=== Alternative Approaches === An alternative path involves utilizing software that exposes browser-like APIs directly. For instance, Deno integrates browser APIs natively. For Node.js environments, jsdom is the most comprehensive provider. While these alternatives generally support core browser functions (HTML parsing, cookies, XHR, partial JavaScript), they typically lack actual DOM rendering and have limited event handling, often resulting in faster execution than full browser emulation.
