web-agent-controller MCP Gateway

Project Note: This MCP server implementation builds upon the browser-use/web-ui foundation. Core browser automation logic and configuration patterns are adapted from the original project.

AI-driven browser automation server implementing the Model Context Protocol (MCP) for natural language browser control.

Capabilities

🧠 MCP Adherence - Comprehensive implementation of the Model Context Protocol for intelligent agent interfacing
🌐 Browser Orchestration - Handling webpage traversal, data entry into forms, and manipulation of DOM elements
👁️ Visual Processing - Incorporation of screenshot analysis for context-aware, vision-based operational decisions
🔄 Session Persistence - Capacity to retain browser environment state across disparate operational requests
🔌 Broad LLM Interoperability - Native connectivity to OpenAI, Anthropic, Azure, DeepSeek endpoints

Initiation Guide

Prerequisites

Python Interpreter Version 3.11 or newer required
Installation utility 'uv' (for Python package management)
Google Chrome or Chromium browser installation

Deployment Instructions

For Claude Desktop Users

Location of configuration file: On MacOS: ~/Library/Application\ Support/Claude/claude_desktop_config.json On Windows: %APPDATA%/Claude/claude_desktop_config.json

"mcpServers": {
    "browser-use": {
      "command": "uvx",
      "args": [
        "mcp-server-browser-use",
      ],
      "env": {
        "OPENROUTER_API_KEY": "",
        "OPENROUTER_ENDPOINT": "https://openrouter.ai/api/v1",
        "OPENAI_ENDPOINT": "https://api.openai.com/v1",
        "OPENAI_API_KEY": "",
        "ANTHROPIC_ENDPOINT": "https://api.anthropic.com",
        "ANTHROPIC_API_KEY": "",
        "GOOGLE_API_KEY": "",
        "AZURE_OPENAI_ENDPOINT": "",
        "AZURE_OPENAI_API_KEY": "",
        "DEEPSEEK_ENDPOINT": "https://api.deepseek.com",
        "DEEPSEEK_API_KEY": "",
        "MISTRAL_API_KEY": "",
        "MISTRAL_ENDPOINT": "https://api.mistral.ai/v1",
        "OLLAMA_ENDPOINT": "http://localhost:11434",
        "ANONYMIZED_TELEMETRY": "true",
        "BROWSER_USE_LOGGING_LEVEL": "info",
        "CHROME_PATH": "",
        "CHROME_USER_DATA": "",
        "CHROME_DEBUGGING_PORT": "9222",
        "CHROME_DEBUGGING_HOST": "localhost",
        "CHROME_PERSISTENT_SESSION": "false",
        "BROWSER_HEADLESS": "false",
        "BROWSER_DISABLE_SECURITY": "false",
        "BROWSER_WINDOW_WIDTH": "1280",
        "BROWSER_WINDOW_HEIGHT": "720",
        "BROWSER_TRACE_PATH": "trace.json",
        "BROWSER_RECORDING_PATH": "recording.mp4",
        "RESOLUTION": "1920x1080x24",
        "RESOLUTION_WIDTH": "1920",
        "RESOLUTION_HEIGHT": "1080",
        "VNC_PASSWORD": "youvncpassword",
        "MCP_MODEL_PROVIDER": "anthropic",
        "MCP_MODEL_NAME": "claude-3-5-sonnet-20241022",
        "MCP_TEMPERATURE": "0.3",
        "MCP_MAX_STEPS": "30",
        "MCP_USE_VISION": "true",
        "MCP_MAX_ACTIONS_PER_STEP": "5",
        "MCP_TOOL_CALL_IN_CONTENT": "true"
    }
}

Local Execution

"browser-use": {
  "command": "uv",
  "args": [
    "--directory",
    "/path/to/mcp-browser-use",
    "run",
    "mcp-server-browser-use"
  ],
  "env": {
    ...
  }
}

Development Flow

# Install necessary development dependencies
uv sync

# Execute with integrated debugging tools
npx @modelcontextprotocol/inspector uv --directory . run mcp-server-browser-use

Error Resolution

Browser Handoff Issues: Guarantee all existing Chrome processes are terminated prior to launching the server.
Authentication Failures: Confirm that all necessary API access credentials within environment variables accurately reflect your chosen Large Language Model service provider settings.
Visual Capabilities Check: To enable visual processing capabilities, set the environment variable MCP_USE_VISION to true.

Provider Configuration Matrix

The system facilitates connection to diverse LLM engines via specific environmental variable settings. Specify the engine using MCP_MODEL_PROVIDER:

Engine	Identifier Value	Essential Environment Variables
Anthropic	`anthropic`	`ANTHROPIC_API_KEY` `ANTHROPIC_ENDPOINT` (Optional endpoint override)
OpenAI	`openai`	`OPENAI_API_KEY` `OPENAI_ENDPOINT` (Optional endpoint override)
Azure OpenAI	`azure_openai`	`AZURE_OPENAI_API_KEY` `AZURE_OPENAI_ENDPOINT`
DeepSeek	`deepseek`	`DEEPSEEK_API_KEY` `DEEPSEEK_ENDPOINT` (Optional endpoint override)
Gemini	`gemini`	`GOOGLE_API_KEY`
Mistral	`mistral`	`MISTRAL_API_KEY` `MISTRAL_ENDPOINT` (Optional endpoint override)
Ollama	`ollama`	`OLLAMA_ENDPOINT` (Defaults to http://localhost:11434 if absent)
OpenRouter	`openrouter`	`OPENROUTER_API_KEY` `OPENROUTER_ENDPOINT` (Optional endpoint override)

Supplementary Information:

Default operational temperature is set via MCP_TEMPERATURE (default: 0.3).
The specific language model to utilize is defined by MCP_MODEL_NAME.
For Ollama deployments, parameters such as num_ctx and num_predict are accessible for fine-tuning context window management.

Acknowledgements

This software is an extension of the browser-use/web-ui project, distributed under the terms of the MIT License. Gratitude is extended to the original developers for establishing the core browser automation architecture.

Legal Status

MIT License - See LICENSE for comprehensive details. WIKIPEDIA: A headless browser operates devoid of a graphical user interface component. Such instrumentalities permit automated manipulation of web documents within an environment functionally equivalent to standard browsers, yet they are initiated via terminal command or network protocols. These are exceptionally valuable for rigorous web application validation, as they replicate browser rendering capabilities, encompassing styling attributes like layout, color scheme, typography, and JavaScript/Ajax execution—functionality often inaccessible through alternative validation methodologies. Since version 59 of Chrome and version 56 of Firefox, native remote control interfaces are integrated, rendering previous solutions, notably PhantomJS, somewhat antiquated.

== Primary Applications == The chief domains where headless browser technology excels are:

Validation testing for contemporary web structures (web quality assurance) Generating high-fidelity static snapshots of web pages. Executing automated checks for JavaScript utility libraries. Automating interactive processes on web interfaces.

=== Secondary Utility Cases === Headless engines are also instrumental in sophisticated web harvesting operations. Google, in 2009, noted that employing a headless instrument could aid search indexation of sites relying heavily on Ajax. Conversely, headless platforms have been exploited for adverse activities, including:

Launching distributed denial-of-service assaults against endpoints. Inflating advertisement view counts. Programmatically operating websites in unforeseen or disallowed manners (e.g., mass credential testing). However, a comprehensive traffic analysis conducted in 2018 suggested no inherent bias among malicious actors toward headless instrumentation; there is no evidence indicating headless variants are disproportionately utilized for harmful operations like DDoS, SQL injection attempts, or cross-site scripting vulnerabilities.

== Operational Frameworks == Given that several mainstream web browsers now natively support non-graphical operation through dedicated interfaces, a variety of software constructs have emerged to standardize browser manipulation:

Selenium WebDriver – Implements the W3C standard for WebDriver protocols. Playwright – A Node.js utility designed for automating Chromium, Firefox, and WebKit engines. Puppeteer – A library for Node.js environments focused on controlling Chrome or Firefox instances.

=== Test Harness Integration === Numerous validation software suites incorporate headless browsing capabilities as part of their testing apparatus.

Capybara integrates headless browsing, utilizing either WebKit or Headless Chrome to simulate end-user interaction within its testing protocols. Jasmine defaults to Selenium but permits configuration for WebKit or Headless Chrome for running browser-based validations. Cypress, a framework dedicated to frontend validation. QF-Test, an apparatus for automated GUI-based program validation where headless operation is also an option.

=== Alternative Methodologies === An alternative paradigm involves utilizing libraries that expose browser Application Programming Interfaces (APIs). For instance, Deno integrates browser APIs intrinsically. For the Node.js ecosystem, jsdom represents the most functionally complete simulation provider. While most of these alternatives can support fundamental browser features (HTML parsing, cookie management, XHR requests, limited JavaScript execution), they typically abstain from full DOM rendering, resulting in restricted support for DOM events. These approaches generally outperform full browser simulations in terms of speed.