MCP Selenium Server

An implementation of the Model Context Protocol (MCP) server that interfaces with Selenium WebDriver, facilitating programmatic control over web browsers via standardized MCP requests.

Video Demonstration (Click to View)

Capabilities

Initiate browser instances with configurable parameters
Direct the browser to specific Uniform Resource Locators (URLs)
Locate document elements using diverse addressing mechanisms
Execute interactions such as primary activation (clicks) and data entry
Manage pointer device actions (hovering, dragging elements)
Process keyboard input simulations
Capture screen representations (screenshots)
Facilitate file uploads to web forms
Support for non-visual execution mode (headless)

Compatible Browsers

Google Chrome
Mozilla Firefox
Microsoft Edge

Integration with Goose

Method 1: Single-click Setup

Copy and paste the subsequent URI into the Goose desktop application address bar to integrate this extension:

goose://extension?cmd=npx&arg=-y&arg=%40angiejones%2Fmcp-selenium&id=selenium-mcp&name=Selenium%20MCP&description=automates%20browser%20interactions

Method 2: Manual Desktop or Command Line Addition

Identifier: Selenium MCP
Summary: automates browser interactions
Execution Command: npx -y @angiejones/mcp-selenium

Integration with Other MCP Clients (e.g., Claude Desktop, etc)

{ "mcpServers": { "selenium": { "command": "npx", "args": ["-y", "@angiejones/mcp-selenium"] } } }

Project Maintenance

To contribute to this repository:

Clone the source code.
Install required packages: npm install
Initiate the server process: npm start

Installation Methods

Installation via Smithery

To deploy MCP Selenium for Claude Desktop automatically using Smithery:

bash npx -y @smithery/cli install @angiejones/mcp-selenium --client claude

Local Package Installation

bash npm install -g @angiejones/mcp-selenium

Execution

Start the automation engine by executing:

bash mcp-selenium

Alternatively, integrate with your MCP configuration via NPX:

{ "mcpServers": { "selenium": { "command": "npx", "args": [ "-y", "@angiejones/mcp-selenium" ] } } }

Available Operations (Tools)

start_browser

Initializes a new browser session.

Arguments: - browser (mandatory): The specific browser application to launch - Type: string - Allowed Values: ["chrome", "firefox"] - options: Configuration settings for the browser instance - Type: object - Fields: - headless: Boolean flag to run without a graphical interface - Type: boolean - arguments: Supplementary command-line switches for the browser - Type: array of strings

Example Payload:

{ "tool": "start_browser", "parameters": { "browser": "chrome", "options": { "headless": true, "arguments": ["--no-sandbox"] } } }

navigate

Instructs the active browser to load a specified web address.

Arguments: - url (mandatory): The destination address (URI) - Type: string

Example Payload:

{ "tool": "navigate", "parameters": { "url": "https://www.example.com" } }

find_element

Locates a single element within the Document Object Model (DOM).

Arguments: - by (mandatory): The methodology used for identification - Type: string - Allowed Values: ["id", "css", "xpath", "name", "tag", "class"] - value (mandatory): The specific identifier corresponding to the chosen strategy - Type: string - timeout: Maximum duration (in milliseconds) to await element visibility - Type: number - Default: 10000

Example Payload:

{ "tool": "find_element", "parameters": { "by": "id", "value": "search-input", "timeout": 5000 } }

click_element

Simulates a primary mouse click interaction on a targeted element.

Arguments: - by (mandatory): The method for element location - Type: string - Allowed Values: ["id", "css", "xpath", "name", "tag", "class"] - value (mandatory): The locator criterion - Type: string - timeout: Maximum waiting period for the element (ms) - Type: number - Default: 10000

Example Payload:

{ "tool": "click_element", "parameters": { "by": "css", "value": ".submit-button" } }

send_keys

Injects textual input into an identified input field.

Arguments: - by (mandatory): Element identification technique - Type: string - Allowed Values: ["id", "css", "xpath", "name", "tag", "class"] - value (mandatory): Criterion for locating the target element - Type: string - text (mandatory): The sequence of characters to input - Type: string - timeout: Maximum delay before element access attempt times out (ms) - Type: number - Default: 10000

Example Payload:

{ "tool": "send_keys", "parameters": { "by": "name", "value": "username", "text": "testuser" } }

get_element_text

Retrieves the visible textual content associated with a DOM element.

Arguments: - by (mandatory): Strategy for element selection - Type: string - Allowed Values: ["id", "css", "xpath", "name", "tag", "class"] - value (mandatory): The specific locator value - Type: string - timeout: Wait duration limit for the element (milliseconds) - Type: number - Default: 10000

Example Payload:

{ "tool": "get_element_text", "parameters": { "by": "css", "value": ".message" } }

hover

Moves the virtual mouse cursor to rest over a specified element.

Arguments: - by (mandatory): Locator selection method - Type: string - Allowed Values: ["id", "css", "xpath", "name", "tag", "class"] - value (mandatory): The value matching the locator type - Type: string - timeout: Time limit for element detection (ms) - Type: number - Default: 10000

Example Payload:

{ "tool": "hover", "parameters": { "by": "css", "value": ".dropdown-menu" } }

drag_and_drop

Executes a sequence of actions to pick up a source element and deposit it onto a target element.

Arguments: - by (mandatory): Locator strategy for the element to be moved (source) - Type: string - Allowed Values: ["id", "css", "xpath", "name", "tag", "class"] - value (mandatory): Identifier string for the source element - Type: string - targetBy (mandatory): Locator strategy for the destination element (target) - Type: string - Allowed Values: ["id", "css", "xpath", "name", "tag", "class"] - targetValue (mandatory): Identifier string for the target element - Type: string - timeout: Maximum time allowed for finding both elements (ms) - Type: number - Default: 10000

Example Payload:

{ "tool": "drag_and_drop", "parameters": { "by": "id", "value": "draggable", "targetBy": "id", "targetValue": "droppable" } }

double_click

Triggers a rapid two-stage primary click event on an element.

Arguments: - by (mandatory): Element identification technique - Type: string - Allowed Values: ["id", "css", "xpath", "name", "tag", "class"] - value (mandatory): The corresponding locator value - Type: string - timeout: Element discovery time limit (ms) - Type: number - Default: 10000

Example Payload:

{ "tool": "double_click", "parameters": { "by": "css", "value": ".editable-text" } }

right_click

Simulates a secondary mouse click (context menu activation) on an element.

Arguments: - by (mandatory): Selector mechanism - Type: string - Allowed Values: ["id", "css", "xpath", "name", "tag", "class"] - value (mandatory): The locator string - Type: string - timeout: Maximum wait duration for the element visibility (ms) - Type: number - Default: 10000

Example Payload:

{ "tool": "right_click", "parameters": { "by": "css", "value": ".context-menu-trigger" } }

press_key

Generates a signal representing the depression of a single key on the virtual keyboard.

Arguments: - key (mandatory): The character or special key code to simulate (e.g., 'Enter', 'Control', 'b') - Type: string

Example Payload:

{ "tool": "press_key", "parameters": { "key": "Enter" } }

upload_file

Handles the process of submitting a local file to an HTML file input control.

Arguments: - by (mandatory): Means of locating the file input field - Type: string - Allowed Values: ["id", "css", "xpath", "name", "tag", "class"] - value (mandatory): The locator string for the input element - Type: string - filePath (mandatory): The full, absolute file system path to the source file - Type: string - timeout: Time limit for element readiness (ms) - Type: number - Default: 10000

Example Payload:

{ "tool": "upload_file", "parameters": { "by": "id", "value": "file-input", "filePath": "/path/to/file.pdf" } }

take_screenshot

Generates an image capture of the current viewport or entire web page.

Arguments: - outputPath (optional): Directory path to persist the image file. If omitted, the image data is returned directly as base64 encoding. - Type: string

Example Payload:

{ "tool": "take_screenshot", "parameters": { "outputPath": "/path/to/screenshot.png" } }

close_session

Terminates the active browser instance and releases associated system resources.

Arguments: No arguments required

Example Payload:

{ "tool": "close_session", "parameters": {} }

Licensing Information

Licensed under the MIT License.

WIKIPEDIA CONTEXT: A browser operating without a graphical user interface is termed a headless browser. These tools enable automated manipulation of web pages, mimicking standard browser function (rendering, JavaScript execution) but through a terminal or network interface. They are crucial for rigorous quality assurance (QA) testing, capturing visual representations, and automating front-end logic verification. Modern browser engines (Chrome >= 59, Firefox >= 56) natively support remote control, superseding older solutions like PhantomJS.

== Primary Applications == The chief uses for headless operation involve:

Automated validation of modern web application functionality (Web Testing).
Generating static visual captures of dynamic pages.
Executing unit tests for JavaScript frameworks.
Systematic, automated interaction with web interfaces.

=== Secondary Applications === Headless agents are also employed for data acquisition (web scraping), as Google confirmed their utility for indexing Ajax-heavy sites in 2009. Conversely, potential misuse includes facilitating Denial of Service (DDoS) attempts, artificially inflating ad metrics, or performing unauthorized automated operations like credential stuffing. However, external analysis from 2018 suggests that malicious actors show no preference for headless tools over standard browser installations for activities like SQL injection or XSS.

== Implementation Landscape == Since several major web engines now natively expose APIs for headless execution, several software packages offer a unified abstraction layer:

Selenium WebDriver – Adheres to W3C WebDriver specifications.
Playwright – A Node.js library supporting Chromium, Firefox, and WebKit.
Puppeteer – A Node.js utility specifically for Chrome/Firefox automation.

=== Quality Assurance Integration === Numerous testing suites integrate headless capabilities:

Capybara incorporates Headless Chrome or WebKit to simulate user flows.
Jasmine defaults to Selenium but supports WebKit or Headless Chrome for execution.
Cypress, a dedicated front-end testing structure.
QF-Test, a tool for GUI-based software verification.

=== Non-Rendering Alternatives === Alternative methods involve leveraging software that exposes browser-like APIs without full visual rendering. Deno incorporates these APIs directly. For Node.js environments, jsdom offers the most comprehensive support for features like DOM parsing, XHR handling, and limited JavaScript execution. These alternatives typically execute faster but lack full DOM rendering and event support compared to genuine browser engines.