logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

playwright-accessibility-agent

Facilitates advanced browser orchestration via the Model Context Protocol (MCP), leveraging Playwright's non-visual accessibility tree to drive web interactions, data retrieval, and automated testing for LLM agents.

Author

playwright-accessibility-agent logo

PhamQuangVinh22022648

Apache License 2.0

Quick Info

GitHub GitHub Stars 0
NPM Weekly Downloads 88803
Tools 1
Last Updated 2026-02-19

Tags

automationbrowserscrapingbrowser automationautomation webfacilitates web

Playwright Accessibility Agent for MCP

This server implements the Model Context Protocol (MCP) to give Large Language Models (LLMs) programmatic control over web browsers using Playwright. Interaction is founded upon structured accessibility tree analysis, deliberately excluding reliance on pixel-based visual perception or computer vision models.

Core Capabilities

  • Structure-First Interaction: Operates solely on the document's accessibility structure, ensuring highly reliable and deterministic command execution.
  • Visual Independence: Eliminates the dependency on visual models (screenshots), leading to faster, more resource-efficient automation.
  • Broad Browser Support: Controls Chromium, Firefox, and WebKit environments.

Agent Utility Cases

  • Executing complex web workflows (e.g., multi-step form submission, account management).
  • Extracting deeply nested, structured information from dynamic web content.
  • Building robust, non-visual regression test suites for web applications.
  • Providing agents with a generalized, reliable web control surface.

Install in VS Code Install in VS Code Insiders

Configuration Snippet

To integrate this automation engine into your MCP setup:

js { "mcpServers": { "playwright_web_driver": { "command": "npx", "args": [ "@playwright/mcp@latest" ] } } }

Command Line Interface Arguments

The server exposes numerous parameters for fine-grained control over browser instantiation and operation:

  • --browser <engine>: Selects the rendering engine (chrome, firefox, webkit) or specific channel (e.g., chrome-canary). Default is chrome.
  • --caps <list>: Comma-separated features to activate (e.g., tabs, pdf, wait). Default enables all.
  • --headless: Activates non-GUI execution mode (default is headed).
  • --vision: DISCOURAGED: Switches to pixel-based interaction using screenshots instead of accessibility snapshots.
  • --user-data-dir <path>: Specifies location for persistent browser profile data.
  • --port <number>: TCP port for the SSE transport layer.

Profile Persistence

Browser states (like login sessions) are maintained within isolated profiles:

  • Windows: %USERPROFILE%\AppData\Local\ms-playwright\mcp-{channel}-profile
  • macOS: ~/Library/Caches/ms-playwright/mcp-{channel}-profile
  • Linux: ~/.cache/ms-playwright/mcp-{channel}-profile

Configuration Schema Reference

The full configuration object allows deep customization across browser settings, context options, server binding, and capability enablement:

typescript { // Browser configuration details browser?: { browserName?: 'chromium' | 'firefox' | 'webkit'; userDataDir?: string; launchOptions?: { / Playwright Launch Options / }; contextOptions?: { / Playwright Context Options / }; cdpEndpoint?: string; },

// Server networking parameters server?: { port?: number; host?: string; },

// Feature flags capabilities?: Array<'core' | 'tabs' | 'pdf' | 'history' | 'wait' | 'files' | 'install' | 'testing'>;

// Output control vision?: boolean; outputDir?: string;

network?: { allowedOrigins?: string[]; blockedOrigins?: string[]; }; 0 noImageResponses?: boolean; // Suppress binary image payload transmission }

Environment Setup Notes

Linux Headed Mode: When running a visible browser instance in environments lacking a native display server (e.g., certain remote SSH sessions or CI runners), ensure the DISPLAY environment variable is correctly configured and specify the transport port via --port.

Docker Deployment: The containerized deployment is currently optimized for headless Chromium execution. The required configuration maps to:

js { "mcpServers": { "playwright": { "command": "docker", "args": ["run", "-i", "--rm", "--init", "mcp/playwright"] } } }

Toolset Overview (Accessibility Mode Default)

The available atomic actions focus on structural manipulation and data gathering:

Core Element Manipulation

  • browser_click: Executes a primary click on a specified accessible element.
  • browser_type: Inputs sequences of text into form controls.
  • browser_select_option: Manages selection state within <select> elements.
  • browser_drag: Simulates sequential mouse movements for a drag-and-drop operation between two identified elements.
  • browser_hover: Triggers mouse-over events on a target element.

Information Gathering

  • browser_snapshot: Retrieves the current DOM structure augmented with accessibility properties (the primary context source).
  • browser_take_screenshot: Captures a visual representation (JPEG/PNG). Note: This is supplemental; actions rely on snapshots.
  • browser_network_requests: Dumps details of all network transactions since page load.

Session Control

  • browser_navigate: Directs the current viewport to a new URI.
  • browser_tab_list/new/select/close: Comprehensive management of concurrent browsing tabs.
  • browser_wait: Inserts temporal delays into the execution sequence for synchronization.

Note on Vision Mode: While the --vision flag permits coordinate-based interaction (browser_screen_* tools), the default, recommended mode relies on robust accessibility references, ensuring superior performance and accessibility compliance.

See Also

`