Advanced Browser Interaction Utilities (MCP)

Elevate your AI agents' situational awareness and browser control capabilities by a factor of ten.

This sophisticated utility suite provides deep browser monitoring and interaction capabilities, leveraging the Anthropic Model Context Protocol (MCP) to funnel browser data capture and analysis through a dedicated Chrome browser add-on.

Consult our detailed documentation for setup procedures, rapid deployment instructions, and collaboration guides.

Development Trajectory

Review our planned feature set and milestones here: Github Roadmap / Project Board

Recent Enhancements

v1.2.0 is live! Key highlights: - Introduction of the "Allow Auto-Paste into Cursor" toggle in the DevTools interface. Screenshots are now automatically injected into the Cursor input field (ensure the Agent input field is focused for functionality). - Integration of a full suite of quality assessment tools powered by Lighthouse, covering SEO, performance, compliance, and development standards. - Deployment of a framework-specific prompt optimized for enhancing the SEO characteristics of NextJS projects. - Addition of "Debugger Mode," which executes all diagnostic tools sequentially, accompanied by an improved prompt set for enhanced logical deduction. - Implementation of "Audit Mode," designed to execute the entire collection of auditing instruments in a predefined sequence. - Resolution of connectivity failures observed on Windows operating systems. - Significant refinement of network communications among the BrowserTools backend server, the extension, and the MCP nexus, featuring automatic host/port detection, persistent reconnection logic, and clean termination protocols. - Simplification of the process to terminate the Browser Tools backend process via Ctrl+C.

Initial Setup Workflow

Operation requires the successful deployment of three distinct components:

Acquire and install our Chrome browser extension here: v1.2.0 BrowserToolsMCP Chrome Extension
Deploy the MCP server component within your Integrated Development Environment (IDE) using this terminal command: npx @agentdeskai/browser-tools-mcp@latest
Initiate the supporting backend service in a separate terminal session: npx @agentdeskai/browser-tools-server@latest

Configuration specifics vary by IDE; refer to your respective IDE documentation for the correct integration method.

CRITICAL CLARIFICATION - Two separate server instances must be operational: - browser-tools-server: A local Node.js service acting as the intermediary for log aggregation. - browser-tools-mcp: The IDE-resident MCP component that bridges communication between the extension and the local backend server.

npx @agentdeskai/browser-tools-mcp@latest is the command for your IDE environment. npx @agentdeskai/browser-tools-server@latest is for the dedicated terminal process.

Following these three steps, activate the BrowserToolsMCP pane within your Chrome Developer Tools.

If operational difficulties persist, attempt the following troubleshooting sequence: - Completely close the browser application (ensure all Chrome processes are terminated). - Relaunch the local Node.js server (browser-tools-server). - Verify that only a single instance of the Chrome DevTools panel is open.

If resolution is not achieved, please report the issue so further diagnostic steps and log collection can be initiated.

Should you encounter any difficulties or possess suggestions for enhancement, kindly file an issue report! For feature proposals, use the enhancement tag or contact me directly via @tedx_ai on x

Comprehensive Change Log Details:

Coding assistants, such as Cursor, can now execute these analytical routines on the currently visible webpage without interruption. By leveraging Puppeteer alongside the Lighthouse npm package, BrowserTools MCP now facilitates:

Compliance verification against WCAG accessibility standards
Identification of performance degradation factors
Detection of on-page Search Engine Optimization deficiencies
Review against established web development standards
Specific diagnostics for NextJS SEO implementation

...all executable without exiting the IDE environment 🎉

🔑 Core Feature Inventory

Audit Discipline	Feature Summary
Accessibility	WCAG conformance checks: color contrast ratios, mandatory alt text presence, keyboard navigation traps, ARIA attribute validation, and more.
Performance	Lighthouse-derived analysis focused on render-blocking assets, excessive Document Object Model (DOM) depth, image optimization status, and other speed determinants.
SEO	Evaluation of crucial on-page elements (metadata, header hierarchy, link structure) with suggested actions for improved search engine discoverability.
Best Practices	Verification against generalized, accepted standards in modern web engineering.
NextJS Audit	Specialized invocation of a prompt designed for NextJS-specific architectural and SEO review.
Audit Mode	Sequential execution of every available auditing function.
Debugger Mode	Sequential execution of all available diagnostic/debugging functions.

🛠️ Executing Quality Assurance Routines

✅ Prerequisites

Confirm you have:

An active browser viewport selected
The BrowserTools extension activated

▶️ Initiating Quality Checks

Automation via Headless Rendering:
Puppeteer orchestrates a hidden instance of the Chrome browser to render the target page and gather metrics, ensuring faithful data capture even for Single Page Applications (SPAs) or dynamically loaded content.

The headless browser instance maintains an active state for 60 seconds post-last audit command to optimize handling of rapid, successive requests.

Standardized Output Format:
Every audit yields results encapsulated in a predictable JSON structure, including aggregate scores and granular issue reports. This format is inherently compatible with MCP clients seeking to parse findings into actionable intelligence.

The MCP layer exposes dedicated functions for initiating these quality checks against the active page. Below are representative command syntax examples for triggering these audits:

Accessibility Assessment (`runAccessibilityAudit`)

Verifies compliance with established accessibility protocols (WCAG).

Sample Invocation Phrases:

"Scan this viewport for accessibility violations."

"Execute an accessibility compliance review."

"Validate WCAG adherence on the current view."

Performance Evaluation (`runPerformanceAudit`)

Pinpoints constraints limiting page rendering speed and load times.

Sample Invocation Phrases:

"Diagnose the cause of poor page responsiveness."

"Assess the loading metrics for this URL."

"Trigger a performance diagnostic routine."

Search Engine Optimization Review (`runSEOAudit`)

Analyzes the page's configuration relative to search engine ranking factors.

Sample Invocation Phrases:

"Provide recommendations for search visibility enhancement."

"Initiate an SEO assessment."

"Check current on-page SEO status."

Development Standards Check (`runBestPracticesAudit`)

Verifies adherence to widely accepted modern web development conventions.

Sample Invocation Phrases:

"Run the best practices verification scan."

"Review adherence to development standards."

"Are there any deviations from standard practices evident?"

Comprehensive Audit Mode (`runAuditMode`)

Executes the complete set of auditing instruments sequentially. If the framework is identified, a NextJS specific evaluation is included.

Sample Invocation Phrases:

"Start the full audit sequence."

"Engage audit operational mode."

NextJS Specific Diagnostics (`runNextJSAudit`)

Scrutinizes NextJS projects for optimal configuration and SEO potential.

Sample Invocation Phrases:

"Execute the NextJS specific quality check."

"Run NextJS evaluation, targeting the app router structure."

"Run NextJS evaluation, targeting the page router structure."

Debugger Mode (`runDebuggerMode`)

Executes all available diagnostic instruments in a predefined operational chain.

Sample Invocation Phrases:

"Activate debugger operational sequencing."

System Architecture Overview

Three primary interconnected elements collaborate to capture and interpret browser operational data:

Browser Extension: The client-side component responsible for snapshotting visuals, capturing console output, tracking network transactions, and recording Document Object Model (DOM) states.
Node.js Backend: A server acting as the communication broker between the Chrome add-on and any running MCP server instance.
MCP Server: The Model Context Protocol implementation that standardizes the toolset interface for consumption by AI clients.

┌─────────────┐ ┌──────────────┐ ┌───────────────┐ ┌─────────────┐ │ MCP Client │ ──► │ MCP Server │ ──► │ Node Server │ ──► │ Chrome │ │ (e.g. │ ◄── │ (Protocol │ ◄── │ (Middleware) │ ◄── │ Extension │ │ Cursor) │ │ Handler) │ │ │ │ │ └─────────────┘ └──────────────┘ └───────────────┘ └─────────────┘

Model Context Protocol (MCP) is a capability supported by Anthropic AI models that allow you to create custom tools for any compatible client. MCP clients like Claude Desktop, Cursor, Cline or Zed can run an MCP server which "teaches" these clients about a new tool that they can use.

These tools can call out to external APIs but in our case, all logs are stored locally on your machine and NEVER sent out to any third-party service or API. BrowserTools MCP runs a local instance of a NodeJS API server which communicates with the BrowserTools Chrome Extension.

All consumers of the BrowserTools MCP Server interface with the same NodeJS API and Chrome extension.

Browser Extension Responsibilities

Intercepts and logs XHR requests/responses alongside console messages.
Monitors activity on user-selected DOM nodes.
Relays all captured telemetry and the current focal element to the BrowserTools Connector service.
Establishes a WebSocket link to receive and transmit screen captures.
Permits user configuration of token limits, truncation thresholds, and the local path for screenshot storage.

Node Server (Middleware) Role

Manages the relay channel between the extension and the MCP server endpoint.
Ingests telemetry (logs, current element focus) from the Chrome extension.
Executes requests originating from the MCP server (e.g., fetching logs, capturing screens, querying the selected element).
Issues WebSocket instructions to the extension for screen capture operations.
Performs intelligent data reduction, truncating excessive string lengths and duplicate log entries to conserve token budget.
Sanitizes outgoing data by scrubbing cookies and sensitive HTTP headers prior to transmission to LLM clients via MCP.

MCP Server Functionality

Implements the formalized Model Context Protocol specification.
Exposes a standardized toolset interface.
Ensures interoperability across diverse MCP-compliant applications (e.g., Cursor, Cline, Zed, Claude Desktop).

Deployment Instructions

Detailed deployment procedures are accessible via our official documentation:

BrowserTools MCP Docs

Operational Use Cases

Once the system is initialized and configured, any client supporting MCP gains the ability to:

Monitor the browser's console output stream.
Intercept and review network payload data.
Capture high-resolution screen images.
Analyze the properties of currently highlighted DOM elements.
Issue commands to clear stored data on the local MCP server.
Execute accessibility, performance, SEO, and best practices assessment routines.

Interoperability

Compatible with any client adhering to the MCP specification.
Primarily optimized for integration within the Cursor IDE environment.
Extensible support for other AI-driven editors and MCP consumers.

WIKIPEDIA: A headless browser is a web browser without a graphical user interface. Headless browsers provide automated control of a web page in an environment similar to popular web browsers, but they are executed via a command-line interface or using network communication. They are particularly useful for testing web pages as they are able to render and understand HTML the same way a browser would, including styling elements such as page layout, color, font selection and execution of JavaScript and Ajax which are usually not available when using other testing methods. Since version 59 of Google Chrome and version 56 of Firefox, there is native support for remote control of the browser. This made earlier efforts obsolete, notably PhantomJS.

== Use cases == The main use cases for headless browsers are:

Test automation in modern web applications (web testing) Taking screenshots of web pages. Running automated tests for JavaScript libraries. Automating interaction of web pages.

=== Other uses === Headless browsers are also useful for web scraping. Google stated in 2009 that using a headless browser could help their search engine index content from websites that use Ajax. Headless browsers have also been misused in various ways:

Perform DDoS attacks on web sites. Increase advertisement impressions. Automate web sites in unintended ways e.g. for credential stuffing. However, a study of browser traffic in 2018 found no preference by malicious actors for headless browsers. There is no indication that headless browsers are used more frequently than non-headless browsers for malicious purposes, like DDoS attacks, SQL injections or cross-site scripting attacks.

== Usage == As several major browsers natively support headless mode through APIs, some software exists to perform browser automation through a unified interface. These include:

Selenium WebDriver – a W3C compliant implementation of WebDriver Playwright – a Node.js library to automate Chromium, Firefox and WebKit Puppeteer – a Node.js library to automate Chrome or Firefox

=== Test automation === Some test automation software and frameworks include headless browsers as part of their testing apparati.

Capybara uses headless browsing, either via WebKit or Headless Chrome to mimic user behavior in its testing protocols. Jasmine uses Selenium by default, but can use WebKit or Headless Chrome, to run browser tests. Cypress, a frontend testing framework QF-Test, a software tool for automated testing of programs via the graphical user interface where a headless browser can also be used for testing.

=== Alternatives === Another approach is to use software that provides browser APIs. For example, Deno provides browser APIs as part of its design. For Node.js, jsdom is the most complete provider. While most are able to support common browser features (HTML parsing, cookies, XHR, some JavaScript, etc.), they do not render the DOM and have limited support for DOM events. They usually perform faster than

browser-interaction-toolkit-mcp

Author

Dbillionaer

Quick Info