Playwright Browser Automation Gateway

English | 日本語

This component functions as a backend server, exposing Playwright's powerful headless browser capabilities through a standardized interface defined by the Model Context Protocol (MCP).

Key Capabilities

Facilitating URL traversal and page loading.
Obtaining the complete source representation of a rendered document.
Extracting only visually rendered textual data.
Identifying and querying the coordinates of interactive GUI elements (buttons, inputs, etc.).
Emulating precise mouse movements.
Providing an 'echo' utility for diagnostic testing of the connection.

Deployment Instructions

Automated Installation via Smithery

For rapid integration into Claude Desktop workflows using Smithery:

bash npx -y @smithery/cli install @showfive/playwright-mcp-server --client claude

Local Setup

bash npm install

Operational Guide

Launching the Backend Service

Execute these commands in sequence:

bash npm run build npm start

Available MCP Endpoints (Tools)

Interaction is managed via the following distinct callable functions:

navigate
Purpose: Directs the browser instance to a specified Uniform Resource Locator.
Parameters: { url: string }
Output: Status of the navigation attempt.
get_all_content
Purpose: Fetches the entirety of the page's text content.
Parameters: None
Output: The accumulated textual payload of the document.
get_visible_content
Purpose: Gathers text content that is presently observable within the viewport.
Parameters: { minVisiblePercentage?: number }
Output: Text extracted only from visible regions.
get_interactive_elements
Purpose: Locates and returns spatial data for actionable UI components.
Parameters: None
Output: Boundary boxes and positioning for interactive controls.
move_mouse
Purpose: Translates the virtual cursor to specified screen coordinates.
Parameters: { x: number, y: number }
Output: Confirmation of cursor repositioning.
mouse_click
Purpose: Simulates a physical mouse button press and release at a set location.
Parameters: { x: number, y: number, button?: "left" | "right" | "middle", clickCount?: number }
Output: Result of the simulated click action.
mouse_wheel
Purpose: Triggers a scroll event analogous to using a mouse wheel.
Parameters: { deltaY: number, deltaX?: number }
Output: Acknowledgment of the scroll operation.
drag_and_drop
Purpose: Executes a sequence simulating pressing down, moving, and releasing the mouse button between two points.
Parameters: { sourceX: number, sourceY: number, targetX: number, targetY: number }
Output: Status of the completed drag-and-drop sequence.
echo
Purpose: Diagnostic utility to return input data immediately.
Parameters: { message: string }
Output: The original input string.

Development Cycle

Executing Verification Suites

bash

Execute the full suite of automated checks

npm test

Start tests in persistent watch mode for iterative development

npm run test:watch

Generate a detailed report on code coverage metrics

npm run test:coverage

Test File Organization

Verification routines for individual tools reside in: tools/*.test.ts
System-level integration tests for the MCP server core are in: mcp-server.test.ts

Core System Attributes

Data Acquisition
Capability to capture the entire DOM structure.
Capability to capture only visually rendered text.
Robust internal HTML document parsing routines.
User Simulation & Element Discovery
Identification and spatial mapping of interactive DOM nodes.
Execution of nuanced mouse behaviors (positioning, actuation, scrolling).
Support for simulated file manipulation via drag/drop actions.
Resilience and Fault Tolerance
Graceful error management during URL transitions.
Handling of operation timeouts.
Validation against syntactically incorrect or invalid URIs.
Configuration Adaptability
Toggle between running the browser in visible (headful) or invisible (headless) mode.
Customization of the HTTP User-Agent string.
Control over the browser viewport dimensions.

Critical Operational Advisories

Configuration of required environmental variables must precede server initiation.
Adherence to the usage policies of any retrieved external web properties is mandatory.
Introduce appropriate temporal delays between successive remote operations.
Ensure timing between simulated mouse events reflects realistic human interaction patterns.

Licensing

ISC

WIKIPEDIA: A search engine functions as a sophisticated software construct designed to index and retrieve Uniform Resource Locators (URLs) and associated web content in direct response to user-submitted information requests. Typically, a patron inputs a query into a dedicated web interface or mobile application, and the system furnishes results, generally structured as a ranked list of hyperlinked references, supplemented by descriptive snippets and graphical elements. Users possess the prerogative to refine their search scope to particular media types, such as visual media, auditory data, or current events reports. For the entity operating the search mechanism, its core engine is integrated within a vast, globally distributed computational framework, spanning numerous data centers worldwide. The responsiveness and precision of the engine’s output are fundamentally contingent upon an intricate indexing structure, which is systematically refreshed by autonomous web-crawling agents. This process encompasses data extraction from publicly accessible files and databases hosted on web servers, though certain proprietary or restricted content remains inaccessible to these automated indexing tools. Since the inception of the World Wide Web in the 1990s, numerous search solutions have emerged; nevertheless, Google Search achieved undisputed market ascendancy during the 2000s and has maintained that position. As of the fifth month of 2025, data from StatCounter indicates Google commands approximately 89–90% of the global search market share, with primary competitors significantly trailing: Bing (~4%), Yandex (~2.5%), Yahoo! (~1.3%), DuckDuckGo (~0.8%), and Baidu (~0.7%). Notably, this represents the first occasion in over a decade that Google's dominance has dipped below the 90% threshold. Consequently, the industry dedicated to enhancing website visibility within search rankings—a practice known as search engine optimization (SEO)—has historically been overwhelmingly concentrated on optimizing for Google.

== Historical Precursors ==

=== Antecedents to the Nineties === In 1945, Vannevar Bush conceptualized an advanced information management utility, designed to grant an individual access to an enormous repository of knowledge from a singular workstation, which he christened the 'memex.' This concept was detailed in his seminal article, "As We May Think," published in The Atlantic Monthly. The memex's objective was to overcome the growing challenge of data retrieval from increasingly centralized indexes of scientific literature. Bush envisioned interconnected, user-annotated research libraries, bearing a striking resemblance to the modern hyperlink structure. Linkage evaluation methodologies eventually became indispensable to search engine operation through the application of algorithms like Hyper Search and PageRank.

=== The 1990s: Emergence of Indexing Systems === The earliest digital search tools predated the formal introduction of the Web in December 1990: the WHOIS user locator system originated in 1982, and the multi-network user lookup service known as Knowbot Information Service was operational by 1989. The inaugural documented search utility capable of indexing file content, specifically FTP archives, was Archie, launched on September 10, 1990. Before September 1993, the entirety of the World Wide Web was indexed manually. Tim Berners-Lee maintained a curated directory of web servers hosted on CERN's infrastructure. While a record from 1992 persists, the exponential proliferation of web servers rendered this centralized directory obsolete. On the NCSA platform, newly established servers were announced under the heading "What's New!". The first

browser-automation-gateway-server

Author

Kotelberg

Quick Info

Actions

Tags