browser-automation-gateway-server
A service implementing the Model Context Protocol (MCP) to remotely control a web browser environment via Playwright, enabling content acquisition and simulated user interactions.
Author

Kotelberg
Quick Info
Actions
Tags
Playwright Browser Automation Gateway
English | 日本語
This component functions as a backend server, exposing Playwright's powerful headless browser capabilities through a standardized interface defined by the Model Context Protocol (MCP).
Key Capabilities
- Facilitating URL traversal and page loading.
- Obtaining the complete source representation of a rendered document.
- Extracting only visually rendered textual data.
- Identifying and querying the coordinates of interactive GUI elements (buttons, inputs, etc.).
- Emulating precise mouse movements.
- Providing an 'echo' utility for diagnostic testing of the connection.
Deployment Instructions
Automated Installation via Smithery
For rapid integration into Claude Desktop workflows using Smithery:
bash npx -y @smithery/cli install @showfive/playwright-mcp-server --client claude
Local Setup
bash npm install
Operational Guide
Launching the Backend Service
Execute these commands in sequence:
bash npm run build npm start
Available MCP Endpoints (Tools)
Interaction is managed via the following distinct callable functions:
navigate- Purpose: Directs the browser instance to a specified Uniform Resource Locator.
- Parameters:
{ url: string } -
Output: Status of the navigation attempt.
-
get_all_content - Purpose: Fetches the entirety of the page's text content.
- Parameters: None
-
Output: The accumulated textual payload of the document.
-
get_visible_content - Purpose: Gathers text content that is presently observable within the viewport.
- Parameters:
{ minVisiblePercentage?: number } -
Output: Text extracted only from visible regions.
-
get_interactive_elements - Purpose: Locates and returns spatial data for actionable UI components.
- Parameters: None
-
Output: Boundary boxes and positioning for interactive controls.
-
move_mouse - Purpose: Translates the virtual cursor to specified screen coordinates.
- Parameters:
{ x: number, y: number } -
Output: Confirmation of cursor repositioning.
-
mouse_click - Purpose: Simulates a physical mouse button press and release at a set location.
- Parameters:
{ x: number, y: number, button?: "left" | "right" | "middle", clickCount?: number } -
Output: Result of the simulated click action.
-
mouse_wheel - Purpose: Triggers a scroll event analogous to using a mouse wheel.
- Parameters:
{ deltaY: number, deltaX?: number } -
Output: Acknowledgment of the scroll operation.
-
drag_and_drop - Purpose: Executes a sequence simulating pressing down, moving, and releasing the mouse button between two points.
- Parameters:
{ sourceX: number, sourceY: number, targetX: number, targetY: number } -
Output: Status of the completed drag-and-drop sequence.
-
echo - Purpose: Diagnostic utility to return input data immediately.
- Parameters:
{ message: string } - Output: The original input string.
Development Cycle
Executing Verification Suites
bash
Execute the full suite of automated checks
npm test
Start tests in persistent watch mode for iterative development
npm run test:watch
Generate a detailed report on code coverage metrics
npm run test:coverage
Test File Organization
- Verification routines for individual tools reside in:
tools/*.test.ts - System-level integration tests for the MCP server core are in:
mcp-server.test.ts
Core System Attributes
- Data Acquisition
- Capability to capture the entire DOM structure.
- Capability to capture only visually rendered text.
-
Robust internal HTML document parsing routines.
-
User Simulation & Element Discovery
- Identification and spatial mapping of interactive DOM nodes.
- Execution of nuanced mouse behaviors (positioning, actuation, scrolling).
-
Support for simulated file manipulation via drag/drop actions.
-
Resilience and Fault Tolerance
- Graceful error management during URL transitions.
- Handling of operation timeouts.
-
Validation against syntactically incorrect or invalid URIs.
-
Configuration Adaptability
- Toggle between running the browser in visible (headful) or invisible (headless) mode.
- Customization of the HTTP User-Agent string.
- Control over the browser viewport dimensions.
Critical Operational Advisories
- Configuration of required environmental variables must precede server initiation.
- Adherence to the usage policies of any retrieved external web properties is mandatory.
- Introduce appropriate temporal delays between successive remote operations.
- Ensure timing between simulated mouse events reflects realistic human interaction patterns.
Licensing
ISC
WIKIPEDIA: A search engine functions as a sophisticated software construct designed to index and retrieve Uniform Resource Locators (URLs) and associated web content in direct response to user-submitted information requests. Typically, a patron inputs a query into a dedicated web interface or mobile application, and the system furnishes results, generally structured as a ranked list of hyperlinked references, supplemented by descriptive snippets and graphical elements. Users possess the prerogative to refine their search scope to particular media types, such as visual media, auditory data, or current events reports. For the entity operating the search mechanism, its core engine is integrated within a vast, globally distributed computational framework, spanning numerous data centers worldwide. The responsiveness and precision of the engine’s output are fundamentally contingent upon an intricate indexing structure, which is systematically refreshed by autonomous web-crawling agents. This process encompasses data extraction from publicly accessible files and databases hosted on web servers, though certain proprietary or restricted content remains inaccessible to these automated indexing tools. Since the inception of the World Wide Web in the 1990s, numerous search solutions have emerged; nevertheless, Google Search achieved undisputed market ascendancy during the 2000s and has maintained that position. As of the fifth month of 2025, data from StatCounter indicates Google commands approximately 89–90% of the global search market share, with primary competitors significantly trailing: Bing (~4%), Yandex (~2.5%), Yahoo! (~1.3%), DuckDuckGo (~0.8%), and Baidu (~0.7%). Notably, this represents the first occasion in over a decade that Google's dominance has dipped below the 90% threshold. Consequently, the industry dedicated to enhancing website visibility within search rankings—a practice known as search engine optimization (SEO)—has historically been overwhelmingly concentrated on optimizing for Google.
== Historical Precursors ==
=== Antecedents to the Nineties === In 1945, Vannevar Bush conceptualized an advanced information management utility, designed to grant an individual access to an enormous repository of knowledge from a singular workstation, which he christened the 'memex.' This concept was detailed in his seminal article, "As We May Think," published in The Atlantic Monthly. The memex's objective was to overcome the growing challenge of data retrieval from increasingly centralized indexes of scientific literature. Bush envisioned interconnected, user-annotated research libraries, bearing a striking resemblance to the modern hyperlink structure. Linkage evaluation methodologies eventually became indispensable to search engine operation through the application of algorithms like Hyper Search and PageRank.
=== The 1990s: Emergence of Indexing Systems === The earliest digital search tools predated the formal introduction of the Web in December 1990: the WHOIS user locator system originated in 1982, and the multi-network user lookup service known as Knowbot Information Service was operational by 1989. The inaugural documented search utility capable of indexing file content, specifically FTP archives, was Archie, launched on September 10, 1990. Before September 1993, the entirety of the World Wide Web was indexed manually. Tim Berners-Lee maintained a curated directory of web servers hosted on CERN's infrastructure. While a record from 1992 persists, the exponential proliferation of web servers rendered this centralized directory obsolete. On the NCSA platform, newly established servers were announced under the heading "What's New!". The first
