Playwright Accessibility Agent (MCP)

This implements a Model Context Protocol (MCP) service built on top of the powerful Playwright library. It abstracts browser operations away from pixel analysis, relying exclusively on the DOM's accessibility tree structure for interaction.

Core Capabilities

Accessibility-First Interaction: Operations are guided by the ARIA/accessibility tree, eliminating reliance on visual input or computer vision models.
Reliable Execution: Provides deterministic control, significantly reducing the variability associated with screenshot-based automation.
Efficiency: Lightweight overhead since rendering pixels is unnecessary for core functional steps.

Application Scenarios

Orchestrating complex web workflows for autonomous agents.
Populating and submitting intricate online forms.
Systematic extraction of structured content.
Building end-to-end regression tests resilient to minor visual shifts.

Configuration Snippets

NPM Initialization (Using NPX)

{ "mcpServers": { "playwright-agent": { "command": "npx", "args": [ "@playwright/mcp@latest" ] } } }

VS Code Integration

To integrate this automation service directly within your IDE environment, use the provided installation links:

Alternatively, use the command line extension installation:

bash

Standard VS Code

code --add-mcp '{"name":"playwright-agent","command":"npx","args":["@playwright/mcp@latest"]}'

Available Server Customizations (CLI Flags)

The agent supports several launch parameters to tailor browser behavior:

--browser <engine>: Specify the rendering engine. Options include chrome, firefox, webkit, or specific channel variants (e.g., chrome-dev). Default is chrome.
--headless: Execute the browser instance without a graphical interface (default is headed mode).
--caps <flags>: Fine-tune enabled features (e.g., pdf, history, wait). Defaults to all.
--port <number> / --host <address>: Configure the server's listening socket for SSE transport.
--vision: (Overrides default behavior) Activates screenshot-based, visual interaction mode. Use accessibility snapshot mode for the standard, non-visual operations.

Persistent State Location

For state management (session data, local storage), Playwright MCP utilizes isolated profiles:

Windows: %USERPROFILE%\AppData\Local\ms-playwright\mcp-chrome-profile
macOS: ~/Library/Caches/ms-playwright/mcp-chrome-profile
Linux: ~/.cache/ms-playwright/mcp-chrome-profile

Server Deployment Considerations

Running Headless on Remote Servers (No Display Environment): When operating on headless systems (like remote VMs without an active X server), you must explicitly define the transport port and bind the host to an accessible interface (e.g., 0.0.0.0).

Launch Server: bash npx @playwright/mcp@latest --headless --port 8931 --host 0.0.0.0
Client Configuration: Set the client URL to the server's location, replacing the placeholder IP with the actual server address ($server-ip). js { "mcpServers": { "playwright-agent": { "url": "http://{$server-ip}:8931/sse" } } }

Docker Image

A pre-built Docker image supports launching the agent specifically in headless Chromium mode:

js { "mcpServers": { "playwright-agent": { "command": "docker", "args": ["run", "-i", "--rm", "--init", "mcp/playwright"] } } }

Operational Modes Overview

The agent operates primarily in two distinct interaction paradigms:

Snapshot Mode (Default): Relies on the page's accessibility structure for robust targeting and interaction.
Vision Mode: Switches to visual processing, requiring screenshot data and coordinate-based inputs. Activate via the --vision flag.

Tool Definitions (Snapshot Mode Focus)

Structural Inspection

browser_snapshot
Goal: Acquire the current structural representation of the webpage (the accessibility tree).

Element Manipulation

browser_click
Parameters: element (description), ref (unique identifier).
Action: Simulates a mouse click on the identified component.
browser_type
Parameters: element, ref, text, submit (optional boolean), slowly (optional boolean).
Action: Inputs specified text into an input field.
browser_select_option
Parameters: element, ref, values (array of target option values).
Action: Sets the selected state for options within a <select> element.
browser_drag
Parameters: startElement, startRef, endElement, endRef.
Action: Drags from a source accessibility node to a destination node.
browser_hover
Parameters: element, ref.
Action: Moves the cursor over a specified component.

Capture and Output

browser_take_screenshot
Parameters: raw (optional boolean for PNG vs JPEG), optional element/ref for clipping.
Action: Renders the current viewport or a specified element as an image.

browser_navigate
Parameters: url (target URI).
browser_tab_new
Parameters: url (optional initial destination).
browser_tab_select
Parameters: index (integer index).
browser_close
Action: Deactivates the current page or tab.