Playwright Record MCP

The Playwright Record MCP is an instance of an MCP server architected around the robust capabilities of Playwright. Its defining attribute is the added capacity to produce motion picture recordings of all executed browser activities. This allows Large Language Models (LLMs) to interact with dynamic web content purely via its semantic structure, derived from accessibility tree snapshots, completely bypassing reliance on pixel-based visual models or screenshots.

Core Capabilities

Performance-Optimized: Utilizes Playwright's underlying accessibility information, avoiding computationally intensive, pixel-based input processing.
Structure-Driven for AI: Eliminates the necessity for computer vision models; operations are exclusively guided by structured data representations.
Predictable Execution: Ensures highly reproducible tool application outcomes, mitigating the inherent variability found in screenshot-dependent methods.
Session Cinematography: Offers the facility to capture the entire sequence of browser actions as a video file.

Operational Scenarios

Orchestrating web traversal and automating data entry into forms
Programmatic extraction of structured information sets
Executing integrity checks and regression testing driven by AI logic
Facilitating general-purpose digital agent interactions with internet resources
Documentation via video capture and subsequent replay analysis of interaction flows

Deployment Instructions

Installation via NPM Package Manager

bash npm install @playwright/record-mcp

Or execute directly via NPX:

bash npx @playwright/record-mcp

Configuration Blueprint Example

NPX Configuration

{ "mcpServers": { "playwright": { "command": "npx", "args": [ "@playwright/record-mcp@latest" ] } } }

Integration within VS Code Environment

The Playwright Record MCP server can be registered through the VS Code command-line utility:

bash

Standard VS Code

code --add-mcp '{"name":"playwright","command":"npx","args":["@playwright/record-mcp@latest"]}'

bash

VS Code Insiders Channel

code-insiders --add-mcp '{"name":"playwright","command":"npx","args":["@playwright/record-mcp@latest"]}'

Upon successful registration, the Playwright Record MCP service becomes accessible for your agent within the VS Code workspace.

Server Command-Line Arguments

The Playwright Record MCP service accepts the following operational flags:

--browser <browser>: Specifies the rendering engine variant. Options include: chrome, firefox, webkit, msedge, plus specific Chrome channels (chrome-beta, chrome-canary, chrome-dev) and Edge channels (msedge-beta, msedge-canary, msedge-dev). Default is chrome.
--caps <caps>: A delimited list of active capabilities (e.g., tabs, pdf, history, wait, files, install). All are enabled by default.
--cdp-endpoint <endpoint>: The address for connecting to a remote Chrome DevTools Protocol endpoint.
--executable-path <path>: Override for the path pointing to the desired browser binary.
--headless: Activates non-GUI execution mode (default is headed).
--port <port>: Designates the TCP port for the Server-Sent Events (SSE) communication channel.
--user-data-dir <path>: Location where browser profile data (sessions, cookies) should be persisted.
--vision: Switches the operational mode to use rendered screenshots instead of accessibility snapshots (default is Aria snapshots).
--record: Enables the new feature for capturing interaction recordings.
--record-path <path>: Specifies the destination directory for saving recordings (default: ./recordings).
--record-format <format>: Defines the output container type, either mp4 or webm (default: mp4).

User Profile Persistence Details

By default, this tool launches browsers utilizing a temporary, isolated profile stored here:

Windows: %USERPROFILE%\AppData\Local\ms-playwright\mcp-chrome-profile
macOS: ~/Library/Caches/ms-playwright/mcp-chrome-profile
Linux: ~/.cache/ms-playwright/mcp-chrome-profile

Session artifacts, like login tokens, reside in this directory. Users may clear this path between distinct automation sessions to ensure a clean state.

Executing in Headless Mode (No Graphical Interface)

This mode is optimal for background processing or batch operations.

{ "mcpServers": { "playwright": { "command": "npx", "args": [ "@playwright/record-mcp@latest", "--headless" ] } } }

Activating Video Capture Functionality

To enable persistent video logging, employ the --record command-line option:

{ "mcpServers": { "playwright": { "command": "npx", "args": [ "@playwright/record-mcp@latest", "--record" ] } } }

To customize where the resulting video artifacts are stored:

{ "mcpServers": { "playwright": { "command": "npx", "args": [ "@playwright/record-mcp@latest", "--record", "--record-path", "./my-recordings" ] } } }

To select an alternative output container standard:

{ "mcpServers": { "playwright": { "command": "npx", "args": [ "@playwright/record-mcp@latest", "--record", "--record-format", "webm" ] } } }

Launching Headed Browsers on Display-less Linux Systems

When a visual browser instance is required on a server environment lacking a $DISPLAY variable, or within IDE worker threads, initiate the MCP server explicitly defining the SSE communication port:

bash npx @playwright/record-mcp@latest --port 8931

Subsequently, configure the client connection details in the MCP configuration file to point to this active SSE endpoint:

{ "mcpServers": { "playwright": { "url": "http://localhost:8931/sse" } } }

Docker Containerization

NOTICE: The current containerized setup is restricted exclusively to headless Chromium execution.

{ "mcpServers": { "playwright": { "command": "docker", "args": ["run", "-i", "--rm", "--init", "mcp/playwright-record"] } } }

To generate the required Docker image locally:

bash docker build -t mcp/playwright-record .

Operational Modes

The automation system supports two distinct operational paradigms:

Snapshot Mode (Default): Relies exclusively on the accessibility tree structure for heightened reliability and reduced latency.
Vision Mode: Switches to processing full-screen images for interaction guidance.

To invoke Vision Mode, append the --vision flag during server instantiation:

{ "mcpServers": { "playwright": { "command": "npx", "args": [ "@playwright/record-mcp@latest", "--vision" ] } } }

Vision Mode is best suited for consumption by sophisticated cognitive models capable of geometric reasoning based on pixel coordinates derived from the provided visual input.

Custom Transport via Programmatic Invocation

javascript import http from 'http';

import { createServer } from '@playwright/record-mcp'; import { SSEServerTransport } from '@modelcontextprotocol/sdk/server/sse.js';

http.createServer(async (req, res) => { // ...

// Instantiates a headless Playwright Record MCP server configured for SSE transport const mcpServer = await createServer({ headless: true, record: true }); const transport = new SSEServerTransport('/messages', res); await mcpServer.connect(transport);

// ... });

Accessible Interaction Primitives

browser_snapshot
Intent: Generates the accessibility tree snapshot for the present viewport; superior to traditional screenshots for structured access.
Arguments: None
browser_click
Intent: Executes a mouse click event on a specified web element.
Arguments:
- element (string): A natural language identifier used to authorize interaction with the target.
- ref (string): The unique identifier for the target element extracted directly from the page snapshot.
browser_drag
Intent: Simulates a drag-and-drop action between two distinct elements.
Arguments:
- startElement (string): Natural language description for the source object.
- startRef (string): Unique locator for the source object.
- endElement (string): Natural language description for the destination object.
- endRef (string): Unique locator for the destination object.
browser_hover
Intent: Positions the cursor over a designated element.
Arguments:
- element (string): Human-readable specification for the element.
- ref (string): Precise reference token from the snapshot data.
browser_type
Intent: Inputs sequential textual data into an interactive field.
Arguments:
- element (string): Semantic descriptor of the target input field.
- ref (string): Snapshot reference for the input field.
- text (string): The sequence of characters to be entered.
- submit (boolean, optional): If true, simulates pressing the 'Enter' key post-entry.
- slowly (boolean, optional): If true, injects text character-by-character, aiding in the triggering of page-side event handlers that react to incremental typing. Default is immediate text injection.

Video Recording Control Primitives (New Suite)

browser_record_start
Intent: Initiates the capture process for real-time browser activity logging.
Arguments:
- path (string, optional): The file path where the resulting recording media will be archived.
- format (string, optional): The desired encoding standard for the video stream (mp4 or webm).
browser_record_stop
Intent: Halts the ongoing recording session and finalizes the media file.
Arguments: None
browser_record_pause
Intent: Temporarily suspends the video capture without terminating the session.
Arguments: None
browser_record_resume
Intent: Restarts video capture following a pause command.
Arguments: None
browser_record_list
Intent: Retrieves an enumeration of all presently recorded or archived video files.
Arguments: None

Usage Demonstration: Video Cycling

javascript // Command to commence video documentation await mcpServer.invoke('browser_record_start', { path: './my-recordings/test-recording.mp4', format: 'mp4' });

// Execute a standard navigational request await mcpServer.invoke('browser_navigate', { url: 'https://example.com' });

// Manipulate the document structure const snapshot = await mcpServer.invoke('browser_snapshot'); // Subsequent logic to locate and interact with elements within the snapshot data...

// Command to finalize the video documentation await mcpServer.invoke('browser_record_stop');

Supported Browser Engines

Google Chrome
Mozilla Firefox
Apple WebKit
Microsoft Edge

Prerequisites

A runtime environment based on Node.js version 18 or newer.
The targeted web browser binaries must be present on the execution system (or leverage the browser_install primitive if available).

Licensing Details

This component is distributed under the Apache-2.0 License.

WIKIPEDIA: A headless browser is a web browser without a graphical user interface. Headless browsers provide automated control of a web page in an environment similar to popular web browsers, but they are executed via a command-line interface or using network communication. They are particularly useful for testing web pages as they are able to render and understand HTML the same way a browser would, including styling elements such as page layout, color, font selection and execution of JavaScript and Ajax which are usually not available when using other testing methods. Since version 59 of Google Chrome and version 56 of Firefox, there is native support for remote control of the browser. This made earlier efforts obsolete, notably PhantomJS.

== Use cases == The main use cases for headless browsers are:

Test automation in modern web applications (web testing) Taking screenshots of web pages. Running automated tests for JavaScript libraries. Automating interaction of web pages.

=== Other uses === Headless browsers are also useful for web scraping. Google stated in 2009 that using a headless browser could help their search engine index content from websites that use Ajax. Headless browsers have also been misused in various ways:

Perform DDoS attacks on web sites. Increase advertisement impressions. Automate web sites in unintended ways e.g. for credential stuffing. However, a study of browser traffic in 2018 found no preference by malicious actors for headless browsers. There is no indication that headless browsers are used more frequently than non-headless browsers for malicious purposes, like DDoS attacks, SQL injections or cross-site scripting attacks.

== Usage == As several major browsers natively support headless mode through APIs, some software exists to perform browser automation through a unified interface. These include:

Selenium WebDriver – a W3C compliant implementation of WebDriver Playwright – a Node.js library to automate Chromium, Firefox and WebKit Puppeteer – a Node.js library to automate Chrome or Firefox

=== Test automation === Some test automation software and frameworks include headless browsers as part of their testing apparati.

Capybara uses headless browsing, either via WebKit or Headless Chrome to mimic user behavior in its testing protocols. Jasmine uses Selenium by default, but can use WebKit or Headless Chrome, to run browser tests. Cypress, a frontend testing framework QF-Test, a software tool for automated testing of programs via the graphical user interface where a headless browser can also be used for testing.

=== Alternatives === Another approach is to use software that provides browser APIs. For example, Deno provides browser APIs as part of its design. For Node.js, jsdom is the most complete provider. While most are able to support common browser features (HTML parsing, cookies, XHR, some JavaScript, etc.), they do not render the DOM and have limited support for DOM events. They usually perform faster than