Remote Browser Interaction Orchestrator (Cloud Implementation)

This Model Context Protocol (MCP) server exposes capabilities for remote browser orchestration utilizing the BrowserCat cloud-based browsing infrastructure. It empowers large language models (LLMs) to engage with live web documents, generate high-fidelity screen captures, and execute arbitrary JavaScript within a genuine browser environment, circumventing the need for on-premise browser installations.

Integrated Capabilities

Action Bindings

browsercat_navigate
- Initiate traversal to any specified Uniform Resource Locator (URL)
- Parameter: url (Textual identifier)
browsercat_screenshot
- Produce rasterized images of the complete viewport or designated regions
- Parameters:
  - name (Text, Mandatory): Unique identifier assigned to the resultant image
  - selector (Textual locator, Optional): Cascading Style Sheets (CSS) selector targeting a specific element
  - width (Numeric, Optional, Default: 800): Horizontal dimension for the capture resolution
  - height (Numeric, Optional, Default: 600): Vertical dimension for the capture resolution
browsercat_click
- Simulate a primary pointer device activation on page elements
- Parameter: selector (CSS locator string): Locator for the target interactive component
browsercat_hover
- Simulate the presence of a pointer device over an element without activation
- Parameter: selector (CSS locator string): Locator for the element to receive the hover event
browsercat_fill
- Populate form input fields with provided data
- Parameters:
  - selector (CSS locator string): Locator for the input control
  - value (Textual data): The content to insert into the field
browsercat_select
- Choose an option from a defined selection mechanism (e.g., dropdown)
- Parameters:
  - selector (CSS locator string): Locator for the select component
  - value (Textual identifier): The specific option value to be selected
browsercat_evaluate
- Interpret and execute custom JavaScript code within the browser's execution context
- Parameter: script (Textual code block): The ECMAScript payload for execution

Information Streams

The controller exposes access to two distinct data conduits:

Console Transcripts (console://logs)
- Textual representation of all output generated by the browser's debugging console.
- Encompasses every message emitted by the rendering engine.
Visual Artifacts (screenshot://<name>)
- Portable Network Graphics (PNG) image files derived from captured screens.
- Retrieval is facilitated using the unique identifier assigned during the capture operation.

Core Attributes

Fully remote, cloud-hosted browser manipulation capability
Zero dependency on prerequisite local browser installations
Monitoring functionality for console diagnostics
Facility for generating high-resolution visual documentation
Native support for arbitrary JavaScript payload execution
Essential web element interaction primitives (navigation, activation, data input)

Deployment Prerequisites for Cloud-Browser-Interface-Controller

Runtime Environmental Variables

The system mandates the following configuration parameter:

BROWSERCAT_API_KEY: The requisite authentication token for the BrowserCat service (Mandatory). Obtain credentials at https://browsercat.xyz/mcp.

Initialization Configuration (NPX Example)

{
  "mcpServers": {
    "browsercat": {
      "command": "npx",
      "args": ["-y", "@browsercatco/mcp-server"],
      "env": {
        "BROWSERCAT_API_KEY": "your-secret-access-key-here"
      }
    }
  }
}

Regulatory Status

This server component is released under the terms of the MIT License. This grants broad permissions for usage, modification, and dissemination, contingent upon adherence to the stipulated conditions within the MIT License framework. Consult the primary LICENSE file in the source repository for complete specifications.

WIKIPEDIA: A headless browser is a web browser without a graphical user interface. Headless browsers provide automated control of a web page in an environment similar to popular web browsers, but they are executed via a command-line interface or using network communication. They are particularly useful for testing web pages as they are able to render and understand HTML the same way a browser would, including styling elements such as page layout, color, font selection and execution of JavaScript and Ajax which are usually not available when using other testing methods. Since version 59 of Google Chrome and version 56 of Firefox, there is native support for remote control of the browser. This made earlier efforts obsolete, notably PhantomJS.

== Use cases == The main use cases for headless browsers are:

Test automation in modern web applications (web testing) Taking screenshots of web pages. Running automated tests for JavaScript libraries. Automating interaction of web pages.

=== Other uses === Headless browsers are also useful for web scraping. Google stated in 2009 that using a headless browser could help their search engine index content from websites that use Ajax. Headless browsers have also been misused in various ways:

Perform DDoS attacks on web sites. Increase advertisement impressions. Automate web sites in unintended ways e.g. for credential stuffing. However, a study of browser traffic in 2018 found no preference by malicious actors for headless browsers. There is no indication that headless browsers are used more frequently than non-headless browsers for malicious purposes, like DDoS attacks, SQL injections or cross-site scripting attacks.

== Usage == As several major browsers natively support headless mode through APIs, some software exists to perform browser automation through a unified interface. These include:

Selenium WebDriver – a W3C compliant implementation of WebDriver Playwright – a Node.js library to automate Chromium, Firefox and WebKit Puppeteer – a Node.js library to automate Chrome or Firefox

=== Test automation === Some test automation software and frameworks include headless browsers as part of their testing apparati.

Capybara uses headless browsing, either via WebKit or Headless Chrome to mimic user behavior in its testing protocols. Jasmine uses Selenium by default, but can use WebKit or Headless Chrome, to run browser tests. Cypress, a frontend testing framework QF-Test, a software tool for automated testing of programs via the graphical user interface where a headless browser can also be used for testing.

=== Alternatives === Another approach is to use software that provides browser APIs. For example, Deno provides browser APIs as part of its design. For Node.js, jsdom is the most complete provider. While most are able to support common browser features (HTML parsing, cookies, XHR, some JavaScript, etc.), they do not render the DOM and have limited support for DOM events. They usually perform faster than