WebNavigator-MCP
A server component enabling programmatic control over web browsers via structured accessibility data streams. It facilitates sophisticated web interaction, data retrieval, and functional validation for automated agents, entirely divorcing execution from visual processing requirements.
Author

markbustamante77
Quick Info
Actions
Tags
Playwright Model Context Protocol (MCP) Server
This implementation serves as an MCP backend leveraging the robust capabilities of Playwright. It offers a deterministic mechanism for language models to interface with dynamic web content by utilizing machine-readable accessibility tree representations instead of relying on visual inputs (screenshots) or vision-centric models.
Core Advantages
- Performance Optimized: Relies on the accessibility tree structure, leading to faster and more efficient operations than pixel-based methods.
- Vision-Agnostic: Operates purely on semantic, structured data, making it ideal for non-visual AI agents.
- Predictable Execution: Ensures consistent and unambiguous command application across different web states.
Primary Applications
- Executing complex web workflows such as form submission and site traversal.
- Extracting deeply nested or contextually important information.
- Facilitating end-to-end verification of web application functionality.
Configuration Snippet (Agent Setup)
js { "mcpServers": { "web_navigator": { "command": "npx", "args": [ "@playwright/mcp@latest" ] } } }
Deployment Integration in IDE Environments
To integrate this automation service directly within your workspace (e.g., VS Code):
Alternatively, command-line registration:
bash code --add-mcp '{"name":"web_navigator","command":"npx","args":["@playwright/mcp@latest"]}'
Server Runtime Customization
The Playwright MCP service supports various launch arguments to fine-tune its operation:
--browser <engine>: Specifies the rendering engine. Options includechrome,firefox,webkit, or specific channels (e.g.,chrome-canary). Default is Chromium.--caps <features>: A delimited list specifying enabled capabilities (e.g.,tabs,pdf).--headless: Executes the browser instance without a visible graphical interface.--port <socket>: Defines the network port used for Server-Sent Events (SSE) communication.--vision: Activates the secondary mode reliant on visual perception (screenshots) instead of accessibility structure.
User Profile Persistence
The automation environment utilizes a dedicated, isolated browser profile location:
- Windows:
%USERPROFILE%\AppData\Local\ms-playwright\mcp-chrome-profile - macOS:
~/Library/Caches/ms-playwright/mcp-chrome-profile - Linux:
~/.cache/ms-playwright/mcp-chrome-profile
Stateful session data (like cookies/logins) is retained here unless the directory is manually cleared.
Headless Execution Example
To enforce background operation without a GUI:
js { "mcpServers": { "web_navigator": { "command": "npx", "args": [ "@playwright/mcp@latest", "--headless" ] } } }
API Operations (Snapshot Mode - Default)
These functions target elements identified via the accessibility hierarchy:
- browser_click: Executes a primary interaction on a targeted component.
- browser_type: Inputs sequential string data into an input field.
- browser_select_option: Manipulates the selection state of a
<select>element. - browser_snapshot: Generates the current DOM structure augmented with accessibility properties (preferred over screenshots for actions).
API Operations (Vision Mode - Screenshot Dependent)
Activated via the --vision flag, these operations rely on screen coordinates:
- browser_screen_move_mouse: Translates the cursor to specific (X, Y) screen coordinates.
- browser_screen_click: Triggers a mouse click at a defined screen location.
- browser_screen_type: Enters text, typically managed by sending keyboard events to the focused coordinate area.
