logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

playwright-accessibility-agent

Enables programmatic control of web browsers via Playwright, leveraging structured accessibility tree data for robust, non-visual automation. Supports navigation, data retrieval, form manipulation, and agent-driven testing with high determinism.

Author

playwright-accessibility-agent logo

ahai72160

Apache License 2.0

Quick Info

GitHub GitHub Stars 0
NPM Weekly Downloads 88803
Tools 1
Last Updated 2026-02-19

Tags

automationbrowserscrapingbrowser automationautomation webfacilitates browser

Playwright Accessibility Agent (MCP)

This implements a Model Context Protocol (MCP) service built on top of the powerful Playwright library. It abstracts browser operations away from pixel analysis, relying exclusively on the DOM's accessibility tree structure for interaction.

Core Capabilities

  • Accessibility-First Interaction: Operations are guided by the ARIA/accessibility tree, eliminating reliance on visual input or computer vision models.
  • Reliable Execution: Provides deterministic control, significantly reducing the variability associated with screenshot-based automation.
  • Efficiency: Lightweight overhead since rendering pixels is unnecessary for core functional steps.

Application Scenarios

  • Orchestrating complex web workflows for autonomous agents.
  • Populating and submitting intricate online forms.
  • Systematic extraction of structured content.
  • Building end-to-end regression tests resilient to minor visual shifts.

Configuration Snippets

NPM Initialization (Using NPX)

{ "mcpServers": { "playwright-agent": { "command": "npx", "args": [ "@playwright/mcp@latest" ] } } }

VS Code Integration

To integrate this automation service directly within your IDE environment, use the provided installation links:

Install in VS Code Install in VS Code Insiders

Alternatively, use the command line extension installation:

bash

Standard VS Code

code --add-mcp '{"name":"playwright-agent","command":"npx","args":["@playwright/mcp@latest"]}'

Available Server Customizations (CLI Flags)

The agent supports several launch parameters to tailor browser behavior:

  • --browser <engine>: Specify the rendering engine. Options include chrome, firefox, webkit, or specific channel variants (e.g., chrome-dev). Default is chrome.
  • --headless: Execute the browser instance without a graphical interface (default is headed mode).
  • --caps <flags>: Fine-tune enabled features (e.g., pdf, history, wait). Defaults to all.
  • --port <number> / --host <address>: Configure the server's listening socket for SSE transport.
  • --vision: (Overrides default behavior) Activates screenshot-based, visual interaction mode. Use accessibility snapshot mode for the standard, non-visual operations.

Persistent State Location

For state management (session data, local storage), Playwright MCP utilizes isolated profiles:

  • Windows: %USERPROFILE%\AppData\Local\ms-playwright\mcp-chrome-profile
  • macOS: ~/Library/Caches/ms-playwright/mcp-chrome-profile
  • Linux: ~/.cache/ms-playwright/mcp-chrome-profile

Server Deployment Considerations

Running Headless on Remote Servers (No Display Environment): When operating on headless systems (like remote VMs without an active X server), you must explicitly define the transport port and bind the host to an accessible interface (e.g., 0.0.0.0).

  1. Launch Server: bash npx @playwright/mcp@latest --headless --port 8931 --host 0.0.0.0

  2. Client Configuration: Set the client URL to the server's location, replacing the placeholder IP with the actual server address ($server-ip). js { "mcpServers": { "playwright-agent": { "url": "http://{$server-ip}:8931/sse" } } }

Docker Image

A pre-built Docker image supports launching the agent specifically in headless Chromium mode:

js { "mcpServers": { "playwright-agent": { "command": "docker", "args": ["run", "-i", "--rm", "--init", "mcp/playwright"] } } }

Operational Modes Overview

The agent operates primarily in two distinct interaction paradigms:

  1. Snapshot Mode (Default): Relies on the page's accessibility structure for robust targeting and interaction.
  2. Vision Mode: Switches to visual processing, requiring screenshot data and coordinate-based inputs. Activate via the --vision flag.

Tool Definitions (Snapshot Mode Focus)

Structural Inspection

  • browser_snapshot
  • Goal: Acquire the current structural representation of the webpage (the accessibility tree).

Element Manipulation

  • browser_click
  • Parameters: element (description), ref (unique identifier).
  • Action: Simulates a mouse click on the identified component.

  • browser_type

  • Parameters: element, ref, text, submit (optional boolean), slowly (optional boolean).
  • Action: Inputs specified text into an input field.

  • browser_select_option

  • Parameters: element, ref, values (array of target option values).
  • Action: Sets the selected state for options within a <select> element.

  • browser_drag

  • Parameters: startElement, startRef, endElement, endRef.
  • Action: Drags from a source accessibility node to a destination node.

  • browser_hover

  • Parameters: element, ref.
  • Action: Moves the cursor over a specified component.

Capture and Output

  • browser_take_screenshot
  • Parameters: raw (optional boolean for PNG vs JPEG), optional element/ref for clipping.
  • Action: Renders the current viewport or a specified element as an image.

Tab & Navigation Control

  • browser_navigate
  • Parameters: url (target URI).

  • browser_tab_new

  • Parameters: url (optional initial destination).

  • browser_tab_select

  • Parameters: index (integer index).

  • browser_close

  • Action: Deactivates the current page or tab.

See Also

`