Playwright Model Context Protocol (MCP) Server

This implementation serves as an MCP backend leveraging the robust capabilities of Playwright. It offers a deterministic mechanism for language models to interface with dynamic web content by utilizing machine-readable accessibility tree representations instead of relying on visual inputs (screenshots) or vision-centric models.

Core Advantages

Performance Optimized: Relies on the accessibility tree structure, leading to faster and more efficient operations than pixel-based methods.
Vision-Agnostic: Operates purely on semantic, structured data, making it ideal for non-visual AI agents.
Predictable Execution: Ensures consistent and unambiguous command application across different web states.

Primary Applications

Executing complex web workflows such as form submission and site traversal.
Extracting deeply nested or contextually important information.
Facilitating end-to-end verification of web application functionality.

Configuration Snippet (Agent Setup)

js { "mcpServers": { "web_navigator": { "command": "npx", "args": [ "@playwright/mcp@latest" ] } } }

Deployment Integration in IDE Environments

To integrate this automation service directly within your workspace (e.g., VS Code):

Alternatively, command-line registration:

bash code --add-mcp '{"name":"web_navigator","command":"npx","args":["@playwright/mcp@latest"]}'

Server Runtime Customization

The Playwright MCP service supports various launch arguments to fine-tune its operation:

--browser <engine>: Specifies the rendering engine. Options include chrome, firefox, webkit, or specific channels (e.g., chrome-canary). Default is Chromium.
--caps <features>: A delimited list specifying enabled capabilities (e.g., tabs,pdf).
--headless: Executes the browser instance without a visible graphical interface.
--port <socket>: Defines the network port used for Server-Sent Events (SSE) communication.
--vision: Activates the secondary mode reliant on visual perception (screenshots) instead of accessibility structure.

User Profile Persistence

The automation environment utilizes a dedicated, isolated browser profile location:

Windows: %USERPROFILE%\AppData\Local\ms-playwright\mcp-chrome-profile
macOS: ~/Library/Caches/ms-playwright/mcp-chrome-profile
Linux: ~/.cache/ms-playwright/mcp-chrome-profile

Stateful session data (like cookies/logins) is retained here unless the directory is manually cleared.

Headless Execution Example

To enforce background operation without a GUI:

js { "mcpServers": { "web_navigator": { "command": "npx", "args": [ "@playwright/mcp@latest", "--headless" ] } } }

API Operations (Snapshot Mode - Default)

These functions target elements identified via the accessibility hierarchy:

browser_click: Executes a primary interaction on a targeted component.
browser_type: Inputs sequential string data into an input field.
browser_select_option: Manipulates the selection state of a <select> element.
browser_snapshot: Generates the current DOM structure augmented with accessibility properties (preferred over screenshots for actions).

API Operations (Vision Mode - Screenshot Dependent)

Activated via the --vision flag, these operations rely on screen coordinates:

browser_screen_move_mouse: Translates the cursor to specific (X, Y) screen coordinates.
browser_screen_click: Triggers a mouse click at a defined screen location.
browser_screen_type: Enters text, typically managed by sending keyboard events to the focused coordinate area.