logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

AI-Controlled-Web-Interface-Manager

Facilitates interaction between advanced AI agents and the Google Chrome environment via a specialized WebSocket messaging framework, enabling sophisticated web navigation, data acquisition, and manipulation operations.

Author

AI-Controlled-Web-Interface-Manager logo

buyitsydney

No License

Quick Info

GitHub GitHub Stars 24
NPM Weekly Downloads 0
Tools 1
Last Updated 2026-02-19

Tags

automationbrowserwebsocketbrowser automationautomation webcodingbaby browser

AI-Controlled-Web-Interface-Manager

Synopsis

This utility empowers sophisticated artificial intelligence entities, such as Claude 3.7 Sonnet operating within the Cursor IDE, to exercise granular control over the Chrome web browser for automated workflow execution. The core functionality relies on establishing a bidirectional communication conduit utilizing the Model Context Protocol (MCP) over WebSockets.

Key Capabilities

  • Programmatic Browser Manipulation: Execute commands to direct Chrome's state, including site navigation, element interaction, and data input.
  • Visual State Capture: Acquire high-fidelity raster images of the entire viewport or designated regions.
  • Session Management: Comprehensive controls for creating, enumerating, switching focus to, and terminating browsing tabs.
  • Interactive Form Processing: Simulate user input sequences such as text entry, key presses, and control selection within web forms.
  • Sequential Command Execution: Define and run complex sequences of browser actions.
  • Rendering Surface Configuration: Dynamically adjust the browser window dimensions, crucial for responsive design validation.

System Topology

The implementation is bifurcated into two primary operational entities:

  1. MCP Tool Host: A server application, implemented in Node.js, responsible for adhering to the Model Context Protocol (MCP) standards for dialogue with Cursor-integrated AI models.
  2. Browser Agent Extension: A dedicated Chrome extension tasked with receiving serialized instructions from the MCP Host and translating them into direct browser manipulations.

The data exchange backbone is secured via WebSocket connections (default ingress point: TCP port 9876), ensuring real-time, two-way data flow between the server and the browser agent.

Deployment Instructions

Prerequisites

  • A runtime environment for Node.js (version 14 or newer).
  • The Google Chrome web browser application.
  • The Cursor Integrated Development Environment loaded with the Claude 3.7 Sonnet model.

MCP Server Initialization

  1. Access the Configuration panel for MCP within Cursor (Settings → MCP).
  2. Provision a new global MCP endpoint using the following specification:

{ "mcpServers": { "AI-Controlled-Web-Interface-Manager": { "command": "npx", "args": ["@sydneyassistent/codingbaby-browser-mcp"] } } }

Browser Extension Integration

  1. Procure and install the mandated CodingBaby Extension from the official Chrome Web Store.
  2. Verify that the extension is active and possesses all requisite operational authorizations.

Operational Example

Interaction is initiated by prompting the Claude 3.7 model in Cursor to invoke the tool's functionality:

Instruct the browser interface to load the resource located at https://example.com

Command Inventory

  • navigate: Directs the browser instance to a specified Uniform Resource Locator (URL).
  • click: Triggers a simulated mouse interaction on a targeted DOM element.
  • type: Injects textual data into form input fields.
  • pressKey: Emulates physical keystroke events.
  • scroll: Modifies the visible portion of the document along specified axes.
  • takeScreenshot: Captures the current visual representation of the browser content.
  • wait: Imposes a temporal suspension on command processing.
  • setViewport: Alters the dimensions of the browser's rendering window.
  • tabNew, tabList, tabSelect, tabClose: Operations dedicated to lifecycle management of browser tabs.
  • batch: Orchestrates the execution of multiple subordinate commands in sequence.
  • close: Terminates the active browser session managed by the tool.

Development and Diagnostic Procedures

Instructions for developers seeking to modify or debug the underlying source code.

Extension Debugging Workflow

To debug the Chrome extension payload loaded directly from source files:

  1. Navigate to chrome://extensions/ within the Chrome browser.
  2. Engage the "Developer mode" toggle, typically situated in the upper-right corner of the interface.
  3. Select the "Load unpacked" control.
  4. Pinpoint and select the chrome-extension subdirectory within the project repository.
  5. The extension will now be active in a development state.
  6. Inspect runtime output by right-clicking the extension icon, choosing "Inspect," and reviewing the Console tab.
  7. Code modifications necessitate refreshing the extension card via its designated refresh icon to propagate changes.

MCP Server Local Execution Debugging

To utilize the locally cloned repository for MCP server debugging:

  1. Obtain the repository source code via cloning or direct download.
  2. Navigate the terminal into the main project directory.
  3. Resolve dependencies using the package manager: bash cd Browser-MCP npm install

  4. Update the MCP configuration within Cursor Settings → MCP to point to the local execution path:

{ "mcpServers": { "AI-Controlled-Web-Interface-Manager-Dev": { "command": "node", "args": ["/absolute/path/to/your/Browser-MCP/index.js"] } } }

Substitute /absolute/path/to/your/ with the verified absolute directory of the cloned project. 5. Initiate a reload of the MCP endpoint via the "Refresh" mechanism in Cursor's settings. 6. Debugging insights can be gathered by monitoring the Cursor MCP status icon logs, augmenting code with verbose console.error() calls, or running the server process directly in an external terminal for complete stdout/stderr visibility.

Issue Resolution

  • Port Allocation Contention: Should the default endpoint (9876) be occupied, the system incorporates an automated routine to attempt port reclamation.
  • Connectivity Failures: Verify the integrity of the Chrome extension installation and its enabled status.

License: MIT

External References

Contextual Note: A headless browser is defined as a web browsing application devoid of a visible graphical user interface. These environments facilitate the automated direction of web pages, mimicking standard browser rendering capabilities (including CSS styling and JavaScript execution) but accessed via programmatic interfaces. Modern browser engines (Chrome >= 59, Firefox >= 56) offer native remote control capabilities, superseding earlier reliance on frameworks like PhantomJS.

== Primary Applications == The central utility derived from such automated rendering environments includes:

  • Automated validation and quality assurance procedures for contemporary web applications.
  • Generation of static visual representations of web documents.
  • Execution of iterative tests for client-side scripting frameworks.
  • Orchestration of user interactions across web interfaces.

=== Secondary Functions === Headless environments are also instrumental in sophisticated web data extraction processes. For instance, Google has leveraged such mechanisms to index content reliant on Asynchronous JavaScript and XML (Ajax).

Potential Malicious Exploitation:

  • Facilitating Distributed Denial of Service (DDoS) campaigns against web properties.
  • Inflating advertising impression metrics.
  • Unauthorized automation of site functions, such as credential testing.

Note: A 2018 traffic analysis study indicated no inherent bias among malicious actors toward utilizing headless environments over conventional browsers for activities like DDoS or injection attacks.

== Operational Implementations == As mainstream browsers now natively expose headless mode APIs, several software suites provide a homogenized interface for browser automation workflows:

  • Selenium WebDriver: Adheres to W3C WebDriver specifications.
  • Playwright: A versatile library supporting Chromium, Firefox, and WebKit automation via Node.js.
  • Puppeteer: Primarily focused on automating Chrome or Firefox instances using Node.js.

=== Test Harness Integration === Numerous testing frameworks integrate headless browser capabilities into their execution apparatus:

  • Capybara: Leverages Headless Chrome or WebKit to mirror user actions.
  • Jasmine: Defaults to Selenium but permits configuration for WebKit or Headless Chrome.
  • Cypress: A dedicated framework for frontend testing.
  • QF-Test: A tool for GUI-based automated program verification that supports headless browsing modes.

=== Non-Rendering Alternatives === An alternative methodology involves utilizing software that exposes browser-like APIs without full graphical rendering. Deno natively integrates these within its core design. For the Node.js ecosystem, jsdom offers the most comprehensive implementation. While these alternatives often support core features (HTML parsing, cookie handling, basic scripting), they typically lack full DOM event simulation or visual output, often resulting in faster performance metrics compared to full browser emulation.

See Also

`