browserAutomationProtocol
A mechanism for structured browser interaction, leveraging Playwright to execute web automation tasks via accessibility tree parsing, thereby eliminating dependency on visual perception models for reliable, deterministic control.
Author

markbustamante77
Quick Info
Actions
Tags
Playwright-backed Automation Protocol (MCP)
This implementation of the Model Context Protocol (MCP) utilizes the robust capabilities of Playwright to manage web browser sessions. It operates exclusively on structured accessibility data retrieved from the Document Object Model (DOM) via Playwright's accessibility layer, completely eschewing reliance on image analysis or computer vision techniques for decision-making and action execution.
Core Attributes
- Performance & Efficiency: By analyzing the accessibility tree, interactions are significantly faster and less resource-intensive than pixel-based processing.
- Predictability: Guarantees consistent, repeatable results due to the deterministic nature of interacting with structured node references.
- LLM Native: Designed for consumption by language models, requiring zero visual processing pipeline overhead.
Application Scenarios
- Automated navigation sequences and intricate form population.
- Systematic retrieval and scraping of organized web content.
- Agent-driven quality assurance and functional verification.
- General-purpose web interfacing for autonomous software entities.
Configuration Snippet
js { "mcpServers": { "playwright": { "command": "npx", "args": [ "@playwright/mcp@latest" ] } } }
Installation in VS Code Environment
To deploy the Playwright MCP service within VS Code, employ one of the following installation prompts:
Alternatively, CLI registration is supported:
bash
For standard VS Code
code --add-mcp '{"name":"playwright","command":"npx","args":["@playwright/mcp@latest"]}'
bash
For VS Code Insiders build
code-insiders --add-mcp '{"name":"playwright","command":"npx","args":["@playwright/mcp@latest"]}'
Once registered, the Playwright automation service becomes accessible for Copilot agent operations within the IDE.
Server Command-Line Parameters
The Playwright MCP service accepts the following runtime arguments to customize its behavior:
--browser <engine>: Specifies the web engine instance. Acceptable values includechrome,firefox,webkit, andmsedge. Specialized Chrome channels (chrome-beta,chrome-canary,chrome-dev) and Edge channels (msedge-beta,msedge-canary,msedge-dev) are also supported. Default selection ischrome.--caps <features>: A comma-delimited list defining functional capabilities to enable (e.g.,tabs,pdf,history,wait,files,install). All are enabled by default.--cdp-endpoint <url>: The DevTools Protocol endpoint to attach to, if connecting to an already running instance.--executable-path <file_path>: Directs the server to utilize a specific browser executable binary.--headless: Configures the browser to run without a graphical user interface (GUI). Default is headed mode.--port <number>: Defines the TCP port for Server-Sent Events (SSE) communication.--user-data-dir <path>: Specifies the location for persistent browser profile data.--vision: Activates vision mode, forcing interaction via screenshot analysis (Aria snapshots are the default mechanism).
Persistent User Profile Locations
Playwright MCP initializes a dedicated browser profile directory for state persistence:
- Windows:
%USERPROFILE%\AppData\Local\ms-playwright\mcp-chrome-profile - macOS:
~/Library/Caches/ms-playwright/mcp-chrome-profile - Linux:
~/.cache/ms-playwright/mcp-chrome-profile
This directory retains session artifacts like login states. Users may purge this location between runs to enforce a clean state.
Operating in Headless Mode (GUI Suppressed)
This configuration is ideal for background processing or automated batch tasks:
js { "mcpServers": { "playwright": { "command": "npx", "args": [ "@playwright/mcp@latest", "--headless" ] } } }
Headed Mode on Display-less Linux or Worker Threads
When launching a visible browser instance on environments lacking a DISPLAY variable (or within isolated IDE workers), the server must be initiated externally with a specified port for SSE connectivity:
bash npx @playwright/mcp@latest --port 8931
Subsequently, the client configuration must explicitly target this SSE stream address:
js { "mcpServers": { "playwright": { "url": "http://localhost:8931/sse" } } }
Interaction Modalities
The service supports two primary operational modes:
- Snapshot Mode (Standard): Leverages deep structural information (accessibility tree) for highly reliable and performant operations.
- Vision Mode (Optional): Relies on visual input (screenshots) for interaction, suitable for models optimized for spatial reasoning.
To engage Vision Mode, append the --vision argument during server initialization:
js { "mcpServers": { "playwright": { "command": "npx", "args": [ "@playwright/mcp@latest", "--vision" ] } } }
Vision Mode is best suited for language models capable of mapping screen coordinates to actionable elements derived from the captured image.
Custom Transport Implementation
For scenarios requiring fine-grained control over data transmission:
js import { createServer } from '@playwright/mcp';
// ... setup logic
const server = createServer({ launchOptions: { headless: true } }); transport = new SSEServerTransport("/messages", res); server.connect(transport);
Snapshot-Driven Operations (Accessibility Focus)
- browser_click
- Function: Executes a mouse click on a specified webpage element.
-
Arguments:
element(string): Descriptive text used to identify the element target.ref(string): The unique, structural identifier obtained from the page snapshot.
-
browser_hover
- Function: Positions the cursor over a target element.
-
Arguments:
element(string): Descriptive text for element identification.ref(string): Exact structural reference token.
-
browser_drag
- Function: Performs a drag-and-drop operation between two DOM elements.
-
Arguments:
startElement(string): Description of the origin element.startRef(string): Structural reference for the source.endElement(string): Description of the destination element.endRef(string): Structural reference for the target.
-
browser_type
- Function: Inputs specified text into an interactive field.
-
Arguments:
element(string): Descriptor for the input field.ref(string): Target element's structural ID.text(string): The character sequence to input.submit(boolean, optional): If true, simulates pressing the 'Enter' key post-typing.slowly(boolean, optional): If enabled, types characters sequentially to permit event handlers to process input incrementally.
-
browser_select_option
- Function: Selects one or more options within an HTML
<select>element. -
Arguments:
element(string): Description of the dropdown control.ref(string): The element's reference identifier.values(array): A list containing the values or labels of the options to be selected.
-
browser_snapshot
- Function: Generates and returns the page's current accessibility tree snapshot (preferred over visual capture).
-
Parameters: None
-
browser_take_screenshot
- Function: Captures a raster image representation of the current viewport.
- Parameters:
raw(boolean, optional): If true, returns raw PNG data; otherwise, returns a compressed JPEG (default).
Vision-Based Coordinate Operations
- browser_screen_move_mouse
- Function: Translates the cursor to a specific screen location.
-
Arguments:
element(string): Descriptive label for context (though interaction is coordinate-based).x(number): Horizontal screen coordinate.y(number): Vertical screen coordinate.
-
browser_screen_capture
- Function: Generates a visual image of the current page state.
-
Parameters: None
-
browser_screen_click
- Function: Simulates a left mouse click at an explicit screen coordinate.
-
Arguments:
element(string): Descriptive text for context.x(number): Target X coordinate.y(number): Target Y coordinate.
-
browser_screen_drag
- Function: Simulates a mouse drag action defined by start and end screen points.
-
Arguments:
element(string): Contextual description.startX(number): Initial horizontal position.startY(number): Initial vertical position.endX(number): Final horizontal position.endY(number): Final vertical position.
-
browser_screen_type
- Function: Inputs text, targeting the currently focused element based on visual context.
-
Arguments:
text(string): The textual content to input.submit(boolean, optional): Triggers an 'Enter' key press after input completion.
-
browser_press_key
- Function: Fires a key press event on the system keyboard interface.
- Arguments:
key(string): The name of the key (e.g.,Tab,F5) or a single character output.
Tab Organization Functions
- browser_tab_list
- Function: Retrieves a list of all currently open browser tabs.
-
Parameters: None
-
browser_tab_new
- Function: Instantiates a fresh browser tab.
-
Parameters:
url(string, optional): The initial Uniform Resource Locator to load. If omitted, an empty tab is opened.
-
browser_tab_select
- Function: Switches focus to a tab identified by its positional index.
-
Parameters:
index(number): Zero-based index of the desired tab.
-
browser_tab_close
- Function: Terminates a specified browser tab.
- Parameters:
index(number, optional): Index of the tab to close. If omitted, the active tab is closed.
Viewport Navigation
- browser_navigate
- Function: Directs the active tab to a new web address.
-
Parameters:
url(string): The destination Uniform Resource Locator.
-
browser_navigate_back
- Function: Reverts to the preceding page in the browsing history.
-
Parameters: None
-
browser_navigate_forward
- Function: Moves to the subsequent page in the browsing history, if available.
- Parameters: None
System Input Emulation
- browser_press_key
- Function: Generates a low-level keyboard event.
- Parameters:
key(string): The name of the key code or the character intended for input.
Diagnostic Output
- browser_console_messages
- Function: Fetches all captured messages logged to the browser's JavaScript console.
- Parameters: None
File Handling Utilities
- browser_file_upload
- Function: Simulates user selection of one or more files for upload.
-
Parameters:
paths(array): An array containing absolute file system paths to the materials being uploaded.
-
browser_pdf_save
- Function: Renders the current web document and saves it as a Portable Document Format (PDF) file.
- Parameters: None
Auxiliary Functions
- browser_wait
- Function: Pauses execution for a defined duration.
-
Parameters:
time(number): Duration to pause, measured in seconds (maximum allowable pause is ten seconds).
-
browser_close
- Function: Shuts down and cleans up the managed browser page/context.
-
Parameters: None
-
browser_install
- Function: Initiates the download and installation of the required browser binary, useful when initial launch fails due to missing components.
- Parameters: None
