Playwright-backed Automation Protocol (MCP)

This implementation of the Model Context Protocol (MCP) utilizes the robust capabilities of Playwright to manage web browser sessions. It operates exclusively on structured accessibility data retrieved from the Document Object Model (DOM) via Playwright's accessibility layer, completely eschewing reliance on image analysis or computer vision techniques for decision-making and action execution.

Core Attributes

Performance & Efficiency: By analyzing the accessibility tree, interactions are significantly faster and less resource-intensive than pixel-based processing.
Predictability: Guarantees consistent, repeatable results due to the deterministic nature of interacting with structured node references.
LLM Native: Designed for consumption by language models, requiring zero visual processing pipeline overhead.

Application Scenarios

Automated navigation sequences and intricate form population.
Systematic retrieval and scraping of organized web content.
Agent-driven quality assurance and functional verification.
General-purpose web interfacing for autonomous software entities.

Configuration Snippet

js { "mcpServers": { "playwright": { "command": "npx", "args": [ "@playwright/mcp@latest" ] } } }

Installation in VS Code Environment

To deploy the Playwright MCP service within VS Code, employ one of the following installation prompts:

Alternatively, CLI registration is supported:

bash

For standard VS Code

code --add-mcp '{"name":"playwright","command":"npx","args":["@playwright/mcp@latest"]}'

bash

For VS Code Insiders build

code-insiders --add-mcp '{"name":"playwright","command":"npx","args":["@playwright/mcp@latest"]}'

Once registered, the Playwright automation service becomes accessible for Copilot agent operations within the IDE.

Server Command-Line Parameters

The Playwright MCP service accepts the following runtime arguments to customize its behavior:

--browser <engine>: Specifies the web engine instance. Acceptable values include chrome, firefox, webkit, and msedge. Specialized Chrome channels (chrome-beta, chrome-canary, chrome-dev) and Edge channels (msedge-beta, msedge-canary, msedge-dev) are also supported. Default selection is chrome.
--caps <features>: A comma-delimited list defining functional capabilities to enable (e.g., tabs, pdf, history, wait, files, install). All are enabled by default.
--cdp-endpoint <url>: The DevTools Protocol endpoint to attach to, if connecting to an already running instance.
--executable-path <file_path>: Directs the server to utilize a specific browser executable binary.
--headless: Configures the browser to run without a graphical user interface (GUI). Default is headed mode.
--port <number>: Defines the TCP port for Server-Sent Events (SSE) communication.
--user-data-dir <path>: Specifies the location for persistent browser profile data.
--vision: Activates vision mode, forcing interaction via screenshot analysis (Aria snapshots are the default mechanism).

Persistent User Profile Locations

Playwright MCP initializes a dedicated browser profile directory for state persistence:

Windows: %USERPROFILE%\AppData\Local\ms-playwright\mcp-chrome-profile
macOS: ~/Library/Caches/ms-playwright/mcp-chrome-profile
Linux: ~/.cache/ms-playwright/mcp-chrome-profile

This directory retains session artifacts like login states. Users may purge this location between runs to enforce a clean state.

Operating in Headless Mode (GUI Suppressed)

This configuration is ideal for background processing or automated batch tasks:

js { "mcpServers": { "playwright": { "command": "npx", "args": [ "@playwright/mcp@latest", "--headless" ] } } }

Headed Mode on Display-less Linux or Worker Threads

When launching a visible browser instance on environments lacking a DISPLAY variable (or within isolated IDE workers), the server must be initiated externally with a specified port for SSE connectivity:

bash npx @playwright/mcp@latest --port 8931

Subsequently, the client configuration must explicitly target this SSE stream address:

js { "mcpServers": { "playwright": { "url": "http://localhost:8931/sse" } } }

Interaction Modalities

The service supports two primary operational modes:

Snapshot Mode (Standard): Leverages deep structural information (accessibility tree) for highly reliable and performant operations.
Vision Mode (Optional): Relies on visual input (screenshots) for interaction, suitable for models optimized for spatial reasoning.

To engage Vision Mode, append the --vision argument during server initialization:

js { "mcpServers": { "playwright": { "command": "npx", "args": [ "@playwright/mcp@latest", "--vision" ] } } }

Vision Mode is best suited for language models capable of mapping screen coordinates to actionable elements derived from the captured image.

Custom Transport Implementation

For scenarios requiring fine-grained control over data transmission:

js import { createServer } from '@playwright/mcp';

// ... setup logic

const server = createServer({ launchOptions: { headless: true } }); transport = new SSEServerTransport("/messages", res); server.connect(transport);

Snapshot-Driven Operations (Accessibility Focus)

browser_click
Function: Executes a mouse click on a specified webpage element.
Arguments:
- element (string): Descriptive text used to identify the element target.
- ref (string): The unique, structural identifier obtained from the page snapshot.
browser_hover
Function: Positions the cursor over a target element.
Arguments:
- element (string): Descriptive text for element identification.
- ref (string): Exact structural reference token.
browser_drag
Function: Performs a drag-and-drop operation between two DOM elements.
Arguments:
- startElement (string): Description of the origin element.
- startRef (string): Structural reference for the source.
- endElement (string): Description of the destination element.
- endRef (string): Structural reference for the target.
browser_type
Function: Inputs specified text into an interactive field.
Arguments:
- element (string): Descriptor for the input field.
- ref (string): Target element's structural ID.
- text (string): The character sequence to input.
- submit (boolean, optional): If true, simulates pressing the 'Enter' key post-typing.
- slowly (boolean, optional): If enabled, types characters sequentially to permit event handlers to process input incrementally.
browser_select_option
Function: Selects one or more options within an HTML <select> element.
Arguments:
- element (string): Description of the dropdown control.
- ref (string): The element's reference identifier.
- values (array): A list containing the values or labels of the options to be selected.
browser_snapshot
Function: Generates and returns the page's current accessibility tree snapshot (preferred over visual capture).
Parameters: None
browser_take_screenshot
Function: Captures a raster image representation of the current viewport.
Parameters:
- raw (boolean, optional): If true, returns raw PNG data; otherwise, returns a compressed JPEG (default).

Vision-Based Coordinate Operations

browser_screen_move_mouse
Function: Translates the cursor to a specific screen location.
Arguments:
- element (string): Descriptive label for context (though interaction is coordinate-based).
- x (number): Horizontal screen coordinate.
- y (number): Vertical screen coordinate.
browser_screen_capture
Function: Generates a visual image of the current page state.
Parameters: None
browser_screen_click
Function: Simulates a left mouse click at an explicit screen coordinate.
Arguments:
- element (string): Descriptive text for context.
- x (number): Target X coordinate.
- y (number): Target Y coordinate.
browser_screen_drag
Function: Simulates a mouse drag action defined by start and end screen points.
Arguments:
- element (string): Contextual description.
- startX (number): Initial horizontal position.
- startY (number): Initial vertical position.
- endX (number): Final horizontal position.
- endY (number): Final vertical position.
browser_screen_type
Function: Inputs text, targeting the currently focused element based on visual context.
Arguments:
- text (string): The textual content to input.
- submit (boolean, optional): Triggers an 'Enter' key press after input completion.
browser_press_key
Function: Fires a key press event on the system keyboard interface.
Arguments:
- key (string): The name of the key (e.g., Tab, F5) or a single character output.

Tab Organization Functions

browser_tab_list
Function: Retrieves a list of all currently open browser tabs.
Parameters: None
browser_tab_new
Function: Instantiates a fresh browser tab.
Parameters:
- url (string, optional): The initial Uniform Resource Locator to load. If omitted, an empty tab is opened.
browser_tab_select
Function: Switches focus to a tab identified by its positional index.
Parameters:
- index (number): Zero-based index of the desired tab.
browser_tab_close
Function: Terminates a specified browser tab.
Parameters:
- index (number, optional): Index of the tab to close. If omitted, the active tab is closed.

browser_navigate
Function: Directs the active tab to a new web address.
Parameters:
- url (string): The destination Uniform Resource Locator.
browser_navigate_back
Function: Reverts to the preceding page in the browsing history.
Parameters: None
browser_navigate_forward
Function: Moves to the subsequent page in the browsing history, if available.
Parameters: None

System Input Emulation

browser_press_key
Function: Generates a low-level keyboard event.
Parameters:
- key (string): The name of the key code or the character intended for input.

Diagnostic Output

browser_console_messages
Function: Fetches all captured messages logged to the browser's JavaScript console.
Parameters: None

File Handling Utilities

browser_file_upload
Function: Simulates user selection of one or more files for upload.
Parameters:
- paths (array): An array containing absolute file system paths to the materials being uploaded.
browser_pdf_save
Function: Renders the current web document and saves it as a Portable Document Format (PDF) file.
Parameters: None

Auxiliary Functions

browser_wait
Function: Pauses execution for a defined duration.
Parameters:
- time (number): Duration to pause, measured in seconds (maximum allowable pause is ten seconds).
browser_close
Function: Shuts down and cleans up the managed browser page/context.
Parameters: None
browser_install
Function: Initiates the download and installation of the required browser binary, useful when initial launch fails due to missing components.
Parameters: None

browserAutomationProtocol

Author

markbustamante77

Quick Info

Actions

Tags