mcp-webdriver-browser-control
Orchestrates user interface manipulation within a web browser environment utilizing the Selenium WebDriver protocol for automated client-side scripting.
Author

angiejones
Quick Info
Actions
Tags
MCP Selenium Server
An implementation of the Model Context Protocol (MCP) server that interfaces with Selenium WebDriver, facilitating programmatic control over web browsers via standardized MCP requests.
Video Demonstration (Click to View)
Capabilities
- Initiate browser instances with configurable parameters
- Direct the browser to specific Uniform Resource Locators (URLs)
- Locate document elements using diverse addressing mechanisms
- Execute interactions such as primary activation (clicks) and data entry
- Manage pointer device actions (hovering, dragging elements)
- Process keyboard input simulations
- Capture screen representations (screenshots)
- Facilitate file uploads to web forms
- Support for non-visual execution mode (headless)
Compatible Browsers
- Google Chrome
- Mozilla Firefox
- Microsoft Edge
Integration with Goose
Method 1: Single-click Setup
Copy and paste the subsequent URI into the Goose desktop application address bar to integrate this extension:
goose://extension?cmd=npx&arg=-y&arg=%40angiejones%2Fmcp-selenium&id=selenium-mcp&name=Selenium%20MCP&description=automates%20browser%20interactions
Method 2: Manual Desktop or Command Line Addition
- Identifier:
Selenium MCP - Summary:
automates browser interactions - Execution Command:
npx -y @angiejones/mcp-selenium
Integration with Other MCP Clients (e.g., Claude Desktop, etc)
{ "mcpServers": { "selenium": { "command": "npx", "args": ["-y", "@angiejones/mcp-selenium"] } } }
Project Maintenance
To contribute to this repository:
- Clone the source code.
- Install required packages:
npm install - Initiate the server process:
npm start
Installation Methods
Installation via Smithery
To deploy MCP Selenium for Claude Desktop automatically using Smithery:
bash npx -y @smithery/cli install @angiejones/mcp-selenium --client claude
Local Package Installation
bash npm install -g @angiejones/mcp-selenium
Execution
Start the automation engine by executing:
bash mcp-selenium
Alternatively, integrate with your MCP configuration via NPX:
{ "mcpServers": { "selenium": { "command": "npx", "args": [ "-y", "@angiejones/mcp-selenium" ] } } }
Available Operations (Tools)
start_browser
Initializes a new browser session.
Arguments:
- browser (mandatory): The specific browser application to launch
- Type: string
- Allowed Values: ["chrome", "firefox"]
- options: Configuration settings for the browser instance
- Type: object
- Fields:
- headless: Boolean flag to run without a graphical interface
- Type: boolean
- arguments: Supplementary command-line switches for the browser
- Type: array of strings
Example Payload:
{ "tool": "start_browser", "parameters": { "browser": "chrome", "options": { "headless": true, "arguments": ["--no-sandbox"] } } }
navigate
Instructs the active browser to load a specified web address.
Arguments:
- url (mandatory): The destination address (URI)
- Type: string
Example Payload:
{ "tool": "navigate", "parameters": { "url": "https://www.example.com" } }
find_element
Locates a single element within the Document Object Model (DOM).
Arguments:
- by (mandatory): The methodology used for identification
- Type: string
- Allowed Values: ["id", "css", "xpath", "name", "tag", "class"]
- value (mandatory): The specific identifier corresponding to the chosen strategy
- Type: string
- timeout: Maximum duration (in milliseconds) to await element visibility
- Type: number
- Default: 10000
Example Payload:
{ "tool": "find_element", "parameters": { "by": "id", "value": "search-input", "timeout": 5000 } }
click_element
Simulates a primary mouse click interaction on a targeted element.
Arguments:
- by (mandatory): The method for element location
- Type: string
- Allowed Values: ["id", "css", "xpath", "name", "tag", "class"]
- value (mandatory): The locator criterion
- Type: string
- timeout: Maximum waiting period for the element (ms)
- Type: number
- Default: 10000
Example Payload:
{ "tool": "click_element", "parameters": { "by": "css", "value": ".submit-button" } }
send_keys
Injects textual input into an identified input field.
Arguments:
- by (mandatory): Element identification technique
- Type: string
- Allowed Values: ["id", "css", "xpath", "name", "tag", "class"]
- value (mandatory): Criterion for locating the target element
- Type: string
- text (mandatory): The sequence of characters to input
- Type: string
- timeout: Maximum delay before element access attempt times out (ms)
- Type: number
- Default: 10000
Example Payload:
{ "tool": "send_keys", "parameters": { "by": "name", "value": "username", "text": "testuser" } }
get_element_text
Retrieves the visible textual content associated with a DOM element.
Arguments:
- by (mandatory): Strategy for element selection
- Type: string
- Allowed Values: ["id", "css", "xpath", "name", "tag", "class"]
- value (mandatory): The specific locator value
- Type: string
- timeout: Wait duration limit for the element (milliseconds)
- Type: number
- Default: 10000
Example Payload:
{ "tool": "get_element_text", "parameters": { "by": "css", "value": ".message" } }
hover
Moves the virtual mouse cursor to rest over a specified element.
Arguments:
- by (mandatory): Locator selection method
- Type: string
- Allowed Values: ["id", "css", "xpath", "name", "tag", "class"]
- value (mandatory): The value matching the locator type
- Type: string
- timeout: Time limit for element detection (ms)
- Type: number
- Default: 10000
Example Payload:
{ "tool": "hover", "parameters": { "by": "css", "value": ".dropdown-menu" } }
drag_and_drop
Executes a sequence of actions to pick up a source element and deposit it onto a target element.
Arguments:
- by (mandatory): Locator strategy for the element to be moved (source)
- Type: string
- Allowed Values: ["id", "css", "xpath", "name", "tag", "class"]
- value (mandatory): Identifier string for the source element
- Type: string
- targetBy (mandatory): Locator strategy for the destination element (target)
- Type: string
- Allowed Values: ["id", "css", "xpath", "name", "tag", "class"]
- targetValue (mandatory): Identifier string for the target element
- Type: string
- timeout: Maximum time allowed for finding both elements (ms)
- Type: number
- Default: 10000
Example Payload:
{ "tool": "drag_and_drop", "parameters": { "by": "id", "value": "draggable", "targetBy": "id", "targetValue": "droppable" } }
double_click
Triggers a rapid two-stage primary click event on an element.
Arguments:
- by (mandatory): Element identification technique
- Type: string
- Allowed Values: ["id", "css", "xpath", "name", "tag", "class"]
- value (mandatory): The corresponding locator value
- Type: string
- timeout: Element discovery time limit (ms)
- Type: number
- Default: 10000
Example Payload:
{ "tool": "double_click", "parameters": { "by": "css", "value": ".editable-text" } }
right_click
Simulates a secondary mouse click (context menu activation) on an element.
Arguments:
- by (mandatory): Selector mechanism
- Type: string
- Allowed Values: ["id", "css", "xpath", "name", "tag", "class"]
- value (mandatory): The locator string
- Type: string
- timeout: Maximum wait duration for the element visibility (ms)
- Type: number
- Default: 10000
Example Payload:
{ "tool": "right_click", "parameters": { "by": "css", "value": ".context-menu-trigger" } }
press_key
Generates a signal representing the depression of a single key on the virtual keyboard.
Arguments:
- key (mandatory): The character or special key code to simulate (e.g., 'Enter', 'Control', 'b')
- Type: string
Example Payload:
{ "tool": "press_key", "parameters": { "key": "Enter" } }
upload_file
Handles the process of submitting a local file to an HTML file input control.
Arguments:
- by (mandatory): Means of locating the file input field
- Type: string
- Allowed Values: ["id", "css", "xpath", "name", "tag", "class"]
- value (mandatory): The locator string for the input element
- Type: string
- filePath (mandatory): The full, absolute file system path to the source file
- Type: string
- timeout: Time limit for element readiness (ms)
- Type: number
- Default: 10000
Example Payload:
{ "tool": "upload_file", "parameters": { "by": "id", "value": "file-input", "filePath": "/path/to/file.pdf" } }
take_screenshot
Generates an image capture of the current viewport or entire web page.
Arguments:
- outputPath (optional): Directory path to persist the image file. If omitted, the image data is returned directly as base64 encoding.
- Type: string
Example Payload:
{ "tool": "take_screenshot", "parameters": { "outputPath": "/path/to/screenshot.png" } }
close_session
Terminates the active browser instance and releases associated system resources.
Arguments: No arguments required
Example Payload:
{ "tool": "close_session", "parameters": {} }
Licensing Information
Licensed under the MIT License.
WIKIPEDIA CONTEXT: A browser operating without a graphical user interface is termed a headless browser. These tools enable automated manipulation of web pages, mimicking standard browser function (rendering, JavaScript execution) but through a terminal or network interface. They are crucial for rigorous quality assurance (QA) testing, capturing visual representations, and automating front-end logic verification. Modern browser engines (Chrome >= 59, Firefox >= 56) natively support remote control, superseding older solutions like PhantomJS.
== Primary Applications == The chief uses for headless operation involve:
- Automated validation of modern web application functionality (Web Testing).
- Generating static visual captures of dynamic pages.
- Executing unit tests for JavaScript frameworks.
- Systematic, automated interaction with web interfaces.
=== Secondary Applications === Headless agents are also employed for data acquisition (web scraping), as Google confirmed their utility for indexing Ajax-heavy sites in 2009. Conversely, potential misuse includes facilitating Denial of Service (DDoS) attempts, artificially inflating ad metrics, or performing unauthorized automated operations like credential stuffing. However, external analysis from 2018 suggests that malicious actors show no preference for headless tools over standard browser installations for activities like SQL injection or XSS.
== Implementation Landscape == Since several major web engines now natively expose APIs for headless execution, several software packages offer a unified abstraction layer:
- Selenium WebDriver – Adheres to W3C WebDriver specifications.
- Playwright – A Node.js library supporting Chromium, Firefox, and WebKit.
- Puppeteer – A Node.js utility specifically for Chrome/Firefox automation.
=== Quality Assurance Integration === Numerous testing suites integrate headless capabilities:
- Capybara incorporates Headless Chrome or WebKit to simulate user flows.
- Jasmine defaults to Selenium but supports WebKit or Headless Chrome for execution.
- Cypress, a dedicated front-end testing structure.
- QF-Test, a tool for GUI-based software verification.
=== Non-Rendering Alternatives === Alternative methods involve leveraging software that exposes browser-like APIs without full visual rendering. Deno incorporates these APIs directly. For Node.js environments, jsdom offers the most comprehensive support for features like DOM parsing, XHR handling, and limited JavaScript execution. These alternatives typically execute faster but lack full DOM rendering and event support compared to genuine browser engines.


