Playwright Accessibility Agent for MCP

This server implements the Model Context Protocol (MCP) to give Large Language Models (LLMs) programmatic control over web browsers using Playwright. Interaction is founded upon structured accessibility tree analysis, deliberately excluding reliance on pixel-based visual perception or computer vision models.

Core Capabilities

Structure-First Interaction: Operates solely on the document's accessibility structure, ensuring highly reliable and deterministic command execution.
Visual Independence: Eliminates the dependency on visual models (screenshots), leading to faster, more resource-efficient automation.
Broad Browser Support: Controls Chromium, Firefox, and WebKit environments.

Agent Utility Cases

Executing complex web workflows (e.g., multi-step form submission, account management).
Extracting deeply nested, structured information from dynamic web content.
Building robust, non-visual regression test suites for web applications.
Providing agents with a generalized, reliable web control surface.

Configuration Snippet

To integrate this automation engine into your MCP setup:

js { "mcpServers": { "playwright_web_driver": { "command": "npx", "args": [ "@playwright/mcp@latest" ] } } }

Command Line Interface Arguments

The server exposes numerous parameters for fine-grained control over browser instantiation and operation:

--browser <engine>: Selects the rendering engine (chrome, firefox, webkit) or specific channel (e.g., chrome-canary). Default is chrome.
--caps <list>: Comma-separated features to activate (e.g., tabs, pdf, wait). Default enables all.
--headless: Activates non-GUI execution mode (default is headed).
--vision: DISCOURAGED: Switches to pixel-based interaction using screenshots instead of accessibility snapshots.
--user-data-dir <path>: Specifies location for persistent browser profile data.
--port <number>: TCP port for the SSE transport layer.

Profile Persistence

Browser states (like login sessions) are maintained within isolated profiles:

Windows: %USERPROFILE%\AppData\Local\ms-playwright\mcp-{channel}-profile
macOS: ~/Library/Caches/ms-playwright/mcp-{channel}-profile
Linux: ~/.cache/ms-playwright/mcp-{channel}-profile

Configuration Schema Reference

The full configuration object allows deep customization across browser settings, context options, server binding, and capability enablement:

typescript { // Browser configuration details browser?: { browserName?: 'chromium' | 'firefox' | 'webkit'; userDataDir?: string; launchOptions?: { / Playwright Launch Options / }; contextOptions?: { / Playwright Context Options / }; cdpEndpoint?: string; },

// Server networking parameters server?: { port?: number; host?: string; },

// Output control vision?: boolean; outputDir?: string;

network?: { allowedOrigins?: string[]; blockedOrigins?: string[]; }; 0 noImageResponses?: boolean; // Suppress binary image payload transmission }

Environment Setup Notes

Linux Headed Mode: When running a visible browser instance in environments lacking a native display server (e.g., certain remote SSH sessions or CI runners), ensure the DISPLAY environment variable is correctly configured and specify the transport port via --port.

Docker Deployment: The containerized deployment is currently optimized for headless Chromium execution. The required configuration maps to:

js { "mcpServers": { "playwright": { "command": "docker", "args": ["run", "-i", "--rm", "--init", "mcp/playwright"] } } }

Toolset Overview (Accessibility Mode Default)

The available atomic actions focus on structural manipulation and data gathering:

Core Element Manipulation

browser_click: Executes a primary click on a specified accessible element.
browser_type: Inputs sequences of text into form controls.
browser_select_option: Manages selection state within <select> elements.
browser_drag: Simulates sequential mouse movements for a drag-and-drop operation between two identified elements.
browser_hover: Triggers mouse-over events on a target element.

Information Gathering

browser_snapshot: Retrieves the current DOM structure augmented with accessibility properties (the primary context source).
browser_take_screenshot: Captures a visual representation (JPEG/PNG). Note: This is supplemental; actions rely on snapshots.
browser_network_requests: Dumps details of all network transactions since page load.

Session Control

browser_navigate: Directs the current viewport to a new URI.
browser_tab_list/new/select/close: Comprehensive management of concurrent browsing tabs.
browser_wait: Inserts temporal delays into the execution sequence for synchronization.

Note on Vision Mode: While the --vision flag permits coordinate-based interaction (browser_screen_* tools), the default, recommended mode relies on robust accessibility references, ensuring superior performance and accessibility compliance.