Language Model-Orchestrated Web Interface Control

A FastMCP service designed to permit large language models (LLMs) to execute complex web browsing procedures through text-based instructions. This server exposes an interface that allows an LLM to programmatically steer a browser instance to interact with web pages, populate input fields, trigger button clicks, and retrieve structured information.

Rapid Initialization Guide

1. Installation Procedure

Install the necessary client package, specifying your preferred backend provider (e.g., OpenAI):

bash pip install -e "git+https://github.com/yourusername/browser-use-mcp.git#egg=browser-use-mcp[openai]"

To include support for all available integrations: bash pip install -e "git+https://github.com/yourusername/browser-use-mcp.git#egg=browser-use-mcp[all-providers]"

Ensure the underlying browser automation tools are available: bash playwright install chromium

2. MCP Client Configuration Setup

Integrate the llm-driven-web-agent service endpoint into your primary MCP client configuration file:

javascript { "mcpServers": { "llm-web-agent": { "command": "browser-use-mcp", "args": ["--model", "gpt-4o"], "env": { "OPENAI_API_KEY": "your-openai-api-key", // Substitute with your actual key or environment variable path "DISPLAY": ":0" // Necessary for environments supporting a graphical display server } } } }

Remember to substitute the placeholder key with your valid authentication credential or configure it to read from an environment variable like process.env.OPENAI_API_KEY.

3. Utilizing the Service in an MCP Client

Python Example utilizing `mcp-use`

python import asyncio import os from dotenv import load_dotenv from langchain_openai import ChatOpenAI from mcp_use import MCPAgent, MCPClient

async def process_web_interaction(): # Load secrets from .env file if present load_dotenv()

# Initialize the client instance based on configuration
client = MCPClient(
    config={
        "mcpServers": {
            "llm-web-agent": {
                "command": "browser-use-mcp",
                "args": ["--model", "gpt-4o"],
                "env": {
                    "OPENAI_API_KEY": os.getenv("OPENAI_API_KEY"),
                    "DISPLAY": ":0",
                },
            }
        }
    }
)

# Select the generative model interface
llm = ChatOpenAI(model="gpt-4o")

# Establish the autonomous agent
agent = MCPAgent(llm=llm, client=client, max_steps=30)

# Execute the complex instruction set
query = """
    Initiate navigation to https://github.com, execute a search query for 'browser-use-mcp', and subsequently generate a high-level summary of the project's purpose.
    """
result = await agent.run(
    query,
    max_steps=30,
)
print(f"\nFinal Output: {result}")

if name == "main": asyncio.run(process_web_interaction())

Configuration for Claude Desktop Environments

Launch the Claude Desktop application.
Navigate to the settings panel, typically under 'Settings → Experimental features'.
Activate the Claude API Beta feature and enable OpenAPI schema exposure.
Place the following configuration snippet into your application's specific configuration file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %AppData%\Claude\claude_desktop_config.json

{ "mcpServers": { "browser-use": { "command": "browser-use-mcp", "args": ["--model", "claude-3-opus-20240229"] } } }

Initiate a new dialogue session within Claude and issue directives for web operations.

Supported Model Integrations

The following Language Model endpoints are compatible with this browser automation service:

Provider	Required API Key Environment Variable(s)
OpenAI	`OPENAI_API_KEY`
Anthropic	`ANTHROPIC_API_KEY`
Google	`GOOGLE_API_KEY`
Cohere	`COHERE_API_KEY`
Mistral AI	`MISTRAL_API_KEY`
Groq	`GROQ_API_KEY`
Together AI	`TOGETHER_API_KEY`
AWS Bedrock	`AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`
Fireworks	`FIREWORKS_API_KEY`
Azure OpenAI	`AZURE_OPENAI_API_KEY` and `AZURE_OPENAI_ENDPOINT`
Vertex AI (Google)	`GOOGLE_APPLICATION_CREDENTIALS`
NVIDIA	`NVIDIA_API_KEY`
AI21 Labs	`AI21_API_KEY`
Databricks	`DATABRICKS_HOST` and `DATABRICKS_TOKEN`
IBM watsonx.ai	`WATSONX_API_KEY`
xAI	`XAI_API_KEY`
Upstage	`UPSTAGE_API_KEY`
Hugging Face	`HUGGINGFACE_API_KEY`
Ollama (Local)	`OLLAMA_BASE_URL`
Llama.cpp (Local)	`LLAMA_CPP_SERVER_URL`

Consult the official LangChain documentation for further integration details: https://python.langchain.com/docs/integrations/chat/

Configuration of credentials can be centralized by creating a .env file in your project root:

OPENAI_API_KEY=your_openai_key_here

Or include the key for any other supported backend

Diagnostic and Resolution Guide

Authentication Failures: Verify that the necessary secret key for the selected provider is correctly established within your operating environment variables or the .env file.
Provider Unavailability: Confirm that the specific package corresponding to your chosen model provider has been installed.
Browser Automation Failures: Execute playwright install chromium to ensure the required browser binaries are present.
Model Specification Errors: If the service rejects the model name, explicitly assign a recognized model identifier using the --model flag during server invocation.
Debugging Verbosity: Activate detailed logging output by including the --debug flag when launching the server process.
Client Setup Mismatch: Double-check that the command string and environment variable mapping in your client configuration precisely match the server's requirements.

Licensing

MIT # browser-use-mcp WIKIPEDIA: A headless browser operates as a web browser but lacks a graphical presentation layer. This mode facilitates the programmatic control of web page content through command-line interfaces or network protocols, closely mimicking the rendering capabilities of standard browsers, including CSS styling, JavaScript execution, and AJAX handling, which is often absent in simpler parsing tools. Modern browser engines (Chrome 59+, Firefox 56+) natively support this remote control functionality, superseding older solutions like PhantomJS.

== Primary Applications == The core use cases for operating browsers in a headless configuration include:

Automated quality assurance workflows for contemporary web applications (Web Testing).
Generating static snapshots (screenshots) of rendered web pages.
Executing automated test suites targeting client-side JavaScript functionality.
Orchestrating complex interactions across web interfaces.

=== Secondary Applications === Headless environments are also valuable for advanced web data harvesting (scraping). Furthermore, they were identified as a method to help search engines index content reliant on Ajax rendering. Conversely, misuse cases include launching distributed denial-of-service (DDoS) attacks, artificially inflating advertisement visibility metrics, or performing unauthorized automated site manipulation (e.g., credential stuffing). However, empirical traffic analysis suggests that malicious actors do not disproportionately favor headless browsers over standard ones for common attacks.

== Control Frameworks == Given that major browser vendors now offer native headless APIs, several unified software interfaces exist to manage this automation layer:

Selenium WebDriver: Adheres to the W3C WebDriver specification for cross-browser automation.
Playwright: A robust Node.js library supporting Chromium, Firefox, and WebKit.
Puppeteer: Primarily focused on automating Chrome or Firefox instances via Node.js.

=== Testing Integration === Several established testing frameworks incorporate headless browser capabilities into their apparatus:

Capybara: Leverages Headless Chrome or WebKit to simulate end-user actions during testing.
Jasmine: Defaults to Selenium but can be configured to utilize WebKit or Headless Chrome for environment execution.
Cypress: A dedicated framework for front-end testing that supports headless operation.
QF-Test: A tool for GUI-based automated testing that supports headless browser execution.

=== Alternative Approaches === An alternative strategy involves employing libraries that simulate browser APIs without launching a full rendering engine. For instance, Deno natively integrates certain browser APIs. In the Node.js ecosystem, jsdom provides the most comprehensive simulation of HTML parsing, cookie management, XHR requests, and partial JavaScript execution. While these alternatives are often faster, they typically lack full DOM rendering capabilities and exhibit limited support for complex DOM events compared to genuine headless instances.

llm-driven-web-agent

Author

pietrozullo

Quick Info

Actions

Tags

Language Model-Orchestrated Web Interface Control

Rapid Initialization Guide

1. Installation Procedure

2. MCP Client Configuration Setup

3. Utilizing the Service in an MCP Client

Python Example utilizing `mcp-use`

Configuration for Claude Desktop Environments

Supported Model Integrations

Or include the key for any other supported backend

Diagnostic and Resolution Guide

Licensing

See Also

llm-driven-web-agent

Author

pietrozullo

Quick Info

Actions

Tags

Language Model-Orchestrated Web Interface Control

Rapid Initialization Guide

1. Installation Procedure

2. MCP Client Configuration Setup

3. Utilizing the Service in an MCP Client

Python Example utilizing mcp-use

Configuration for Claude Desktop Environments

Supported Model Integrations

Or include the key for any other supported backend

Diagnostic and Resolution Guide

Licensing

See Also

Python Example utilizing `mcp-use`