playwright-accessibility-agent
Enables programmatic control of web browsers via Playwright, leveraging structured accessibility tree data for robust, non-visual automation. Supports navigation, data retrieval, form manipulation, and agent-driven testing with high determinism.
Author

ahai72160
Quick Info
Actions
Tags
Playwright Accessibility Agent (MCP)
This implements a Model Context Protocol (MCP) service built on top of the powerful Playwright library. It abstracts browser operations away from pixel analysis, relying exclusively on the DOM's accessibility tree structure for interaction.
Core Capabilities
- Accessibility-First Interaction: Operations are guided by the ARIA/accessibility tree, eliminating reliance on visual input or computer vision models.
- Reliable Execution: Provides deterministic control, significantly reducing the variability associated with screenshot-based automation.
- Efficiency: Lightweight overhead since rendering pixels is unnecessary for core functional steps.
Application Scenarios
- Orchestrating complex web workflows for autonomous agents.
- Populating and submitting intricate online forms.
- Systematic extraction of structured content.
- Building end-to-end regression tests resilient to minor visual shifts.
Configuration Snippets
NPM Initialization (Using NPX)
{ "mcpServers": { "playwright-agent": { "command": "npx", "args": [ "@playwright/mcp@latest" ] } } }
VS Code Integration
To integrate this automation service directly within your IDE environment, use the provided installation links:
Alternatively, use the command line extension installation:
bash
Standard VS Code
code --add-mcp '{"name":"playwright-agent","command":"npx","args":["@playwright/mcp@latest"]}'
Available Server Customizations (CLI Flags)
The agent supports several launch parameters to tailor browser behavior:
--browser <engine>: Specify the rendering engine. Options includechrome,firefox,webkit, or specific channel variants (e.g.,chrome-dev). Default ischrome.--headless: Execute the browser instance without a graphical interface (default is headed mode).--caps <flags>: Fine-tune enabled features (e.g.,pdf,history,wait). Defaults to all.--port <number>/--host <address>: Configure the server's listening socket for SSE transport.--vision: (Overrides default behavior) Activates screenshot-based, visual interaction mode. Use accessibility snapshot mode for the standard, non-visual operations.
Persistent State Location
For state management (session data, local storage), Playwright MCP utilizes isolated profiles:
- Windows:
%USERPROFILE%\AppData\Local\ms-playwright\mcp-chrome-profile - macOS:
~/Library/Caches/ms-playwright/mcp-chrome-profile - Linux:
~/.cache/ms-playwright/mcp-chrome-profile
Server Deployment Considerations
Running Headless on Remote Servers (No Display Environment):
When operating on headless systems (like remote VMs without an active X server), you must explicitly define the transport port and bind the host to an accessible interface (e.g., 0.0.0.0).
-
Launch Server: bash npx @playwright/mcp@latest --headless --port 8931 --host 0.0.0.0
-
Client Configuration: Set the client URL to the server's location, replacing the placeholder IP with the actual server address (
$server-ip). js { "mcpServers": { "playwright-agent": { "url": "http://{$server-ip}:8931/sse" } } }
Docker Image
A pre-built Docker image supports launching the agent specifically in headless Chromium mode:
js { "mcpServers": { "playwright-agent": { "command": "docker", "args": ["run", "-i", "--rm", "--init", "mcp/playwright"] } } }
Operational Modes Overview
The agent operates primarily in two distinct interaction paradigms:
- Snapshot Mode (Default): Relies on the page's accessibility structure for robust targeting and interaction.
- Vision Mode: Switches to visual processing, requiring screenshot data and coordinate-based inputs. Activate via the
--visionflag.
Tool Definitions (Snapshot Mode Focus)
Structural Inspection
- browser_snapshot
- Goal: Acquire the current structural representation of the webpage (the accessibility tree).
Element Manipulation
- browser_click
- Parameters:
element(description),ref(unique identifier). -
Action: Simulates a mouse click on the identified component.
-
browser_type
- Parameters:
element,ref,text,submit(optional boolean),slowly(optional boolean). -
Action: Inputs specified text into an input field.
-
browser_select_option
- Parameters:
element,ref,values(array of target option values). -
Action: Sets the selected state for options within a
<select>element. -
browser_drag
- Parameters:
startElement,startRef,endElement,endRef. -
Action: Drags from a source accessibility node to a destination node.
-
browser_hover
- Parameters:
element,ref. - Action: Moves the cursor over a specified component.
Capture and Output
- browser_take_screenshot
- Parameters:
raw(optional boolean for PNG vs JPEG), optionalelement/reffor clipping. - Action: Renders the current viewport or a specified element as an image.
Tab & Navigation Control
- browser_navigate
-
Parameters:
url(target URI). -
browser_tab_new
-
Parameters:
url(optional initial destination). -
browser_tab_select
-
Parameters:
index(integer index). -
browser_close
- Action: Deactivates the current page or tab.
