cloud-browser-interface-controller
Orchestrate remote web manipulation tasks, including page traversal, visual capture generation, and ECMAScript execution within a distributed browser execution plane, eliminating reliance on local software binaries.
Author

browsercat
Quick Info
Actions
Tags
Remote Browser Interaction Orchestrator (Cloud Implementation)
This Model Context Protocol (MCP) server exposes capabilities for remote browser orchestration utilizing the BrowserCat cloud-based browsing infrastructure. It empowers large language models (LLMs) to engage with live web documents, generate high-fidelity screen captures, and execute arbitrary JavaScript within a genuine browser environment, circumventing the need for on-premise browser installations.
Integrated Capabilities
Action Bindings
- browsercat_navigate
- Initiate traversal to any specified Uniform Resource Locator (URL)
- Parameter:
url(Textual identifier)
- browsercat_screenshot
- Produce rasterized images of the complete viewport or designated regions
- Parameters:
name(Text, Mandatory): Unique identifier assigned to the resultant imageselector(Textual locator, Optional): Cascading Style Sheets (CSS) selector targeting a specific elementwidth(Numeric, Optional, Default: 800): Horizontal dimension for the capture resolutionheight(Numeric, Optional, Default: 600): Vertical dimension for the capture resolution
- browsercat_click
- Simulate a primary pointer device activation on page elements
- Parameter:
selector(CSS locator string): Locator for the target interactive component
- browsercat_hover
- Simulate the presence of a pointer device over an element without activation
- Parameter:
selector(CSS locator string): Locator for the element to receive the hover event
- browsercat_fill
- Populate form input fields with provided data
- Parameters:
selector(CSS locator string): Locator for the input controlvalue(Textual data): The content to insert into the field
- browsercat_select
- Choose an option from a defined selection mechanism (e.g., dropdown)
- Parameters:
selector(CSS locator string): Locator for the select componentvalue(Textual identifier): The specific option value to be selected
- browsercat_evaluate
- Interpret and execute custom JavaScript code within the browser's execution context
- Parameter:
script(Textual code block): The ECMAScript payload for execution
Information Streams
The controller exposes access to two distinct data conduits:
- Console Transcripts (
console://logs)- Textual representation of all output generated by the browser's debugging console.
- Encompasses every message emitted by the rendering engine.
- Visual Artifacts (
screenshot://<name>)- Portable Network Graphics (PNG) image files derived from captured screens.
- Retrieval is facilitated using the unique identifier assigned during the capture operation.
Core Attributes
- Fully remote, cloud-hosted browser manipulation capability
- Zero dependency on prerequisite local browser installations
- Monitoring functionality for console diagnostics
- Facility for generating high-resolution visual documentation
- Native support for arbitrary JavaScript payload execution
- Essential web element interaction primitives (navigation, activation, data input)
Deployment Prerequisites for Cloud-Browser-Interface-Controller
Runtime Environmental Variables
The system mandates the following configuration parameter:
BROWSERCAT_API_KEY: The requisite authentication token for the BrowserCat service (Mandatory). Obtain credentials at https://browsercat.xyz/mcp.
Initialization Configuration (NPX Example)
{
"mcpServers": {
"browsercat": {
"command": "npx",
"args": ["-y", "@browsercatco/mcp-server"],
"env": {
"BROWSERCAT_API_KEY": "your-secret-access-key-here"
}
}
}
}
Regulatory Status
This server component is released under the terms of the MIT License. This grants broad permissions for usage, modification, and dissemination, contingent upon adherence to the stipulated conditions within the MIT License framework. Consult the primary LICENSE file in the source repository for complete specifications.
WIKIPEDIA: A headless browser is a web browser without a graphical user interface. Headless browsers provide automated control of a web page in an environment similar to popular web browsers, but they are executed via a command-line interface or using network communication. They are particularly useful for testing web pages as they are able to render and understand HTML the same way a browser would, including styling elements such as page layout, color, font selection and execution of JavaScript and Ajax which are usually not available when using other testing methods. Since version 59 of Google Chrome and version 56 of Firefox, there is native support for remote control of the browser. This made earlier efforts obsolete, notably PhantomJS.
== Use cases == The main use cases for headless browsers are:
Test automation in modern web applications (web testing) Taking screenshots of web pages. Running automated tests for JavaScript libraries. Automating interaction of web pages.
=== Other uses === Headless browsers are also useful for web scraping. Google stated in 2009 that using a headless browser could help their search engine index content from websites that use Ajax. Headless browsers have also been misused in various ways:
Perform DDoS attacks on web sites. Increase advertisement impressions. Automate web sites in unintended ways e.g. for credential stuffing. However, a study of browser traffic in 2018 found no preference by malicious actors for headless browsers. There is no indication that headless browsers are used more frequently than non-headless browsers for malicious purposes, like DDoS attacks, SQL injections or cross-site scripting attacks.
== Usage == As several major browsers natively support headless mode through APIs, some software exists to perform browser automation through a unified interface. These include:
Selenium WebDriver – a W3C compliant implementation of WebDriver Playwright – a Node.js library to automate Chromium, Firefox and WebKit Puppeteer – a Node.js library to automate Chrome or Firefox
=== Test automation === Some test automation software and frameworks include headless browsers as part of their testing apparati.
Capybara uses headless browsing, either via WebKit or Headless Chrome to mimic user behavior in its testing protocols. Jasmine uses Selenium by default, but can use WebKit or Headless Chrome, to run browser tests. Cypress, a frontend testing framework QF-Test, a software tool for automated testing of programs via the graphical user interface where a headless browser can also be used for testing.
=== Alternatives === Another approach is to use software that provides browser APIs. For example, Deno provides browser APIs as part of its design. For Node.js, jsdom is the most complete provider. While most are able to support common browser features (HTML parsing, cookies, XHR, some JavaScript, etc.), they do not render the DOM and have limited support for DOM events. They usually perform faster than
