logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

llm-driven-web-agent

Facilitate the programmatic control of web environments using natural language prompts, enabling tasks such as site navigation, data form submission, and element interaction via an integrated language model interface.

Author

llm-driven-web-agent logo

pietrozullo

No License

Quick Info

GitHub GitHub Stars 3
NPM Weekly Downloads 620
Tools 1
Last Updated 2026-02-19

Tags

automationscrapingbrowserbrowser automationautomation webautomate web




Language Model-Orchestrated Web Interface Control

A FastMCP service designed to permit large language models (LLMs) to execute complex web browsing procedures through text-based instructions. This server exposes an interface that allows an LLM to programmatically steer a browser instance to interact with web pages, populate input fields, trigger button clicks, and retrieve structured information.

Rapid Initialization Guide

1. Installation Procedure

Install the necessary client package, specifying your preferred backend provider (e.g., OpenAI):

bash pip install -e "git+https://github.com/yourusername/browser-use-mcp.git#egg=browser-use-mcp[openai]"

To include support for all available integrations: bash pip install -e "git+https://github.com/yourusername/browser-use-mcp.git#egg=browser-use-mcp[all-providers]"

Ensure the underlying browser automation tools are available: bash playwright install chromium

2. MCP Client Configuration Setup

Integrate the llm-driven-web-agent service endpoint into your primary MCP client configuration file:

javascript { "mcpServers": { "llm-web-agent": { "command": "browser-use-mcp", "args": ["--model", "gpt-4o"], "env": { "OPENAI_API_KEY": "your-openai-api-key", // Substitute with your actual key or environment variable path "DISPLAY": ":0" // Necessary for environments supporting a graphical display server } } } }

Remember to substitute the placeholder key with your valid authentication credential or configure it to read from an environment variable like process.env.OPENAI_API_KEY.

3. Utilizing the Service in an MCP Client

Python Example utilizing mcp-use

python import asyncio import os from dotenv import load_dotenv from langchain_openai import ChatOpenAI from mcp_use import MCPAgent, MCPClient

async def process_web_interaction(): # Load secrets from .env file if present load_dotenv()

# Initialize the client instance based on configuration
client = MCPClient(
    config={
        "mcpServers": {
            "llm-web-agent": {
                "command": "browser-use-mcp",
                "args": ["--model", "gpt-4o"],
                "env": {
                    "OPENAI_API_KEY": os.getenv("OPENAI_API_KEY"),
                    "DISPLAY": ":0",
                },
            }
        }
    }
)

# Select the generative model interface
llm = ChatOpenAI(model="gpt-4o")

# Establish the autonomous agent
agent = MCPAgent(llm=llm, client=client, max_steps=30)

# Execute the complex instruction set
query = """
    Initiate navigation to https://github.com, execute a search query for 'browser-use-mcp', and subsequently generate a high-level summary of the project's purpose.
    """
result = await agent.run(
    query,
    max_steps=30,
)
print(f"\nFinal Output: {result}")

if name == "main": asyncio.run(process_web_interaction())

Configuration for Claude Desktop Environments

  1. Launch the Claude Desktop application.
  2. Navigate to the settings panel, typically under 'Settings → Experimental features'.
  3. Activate the Claude API Beta feature and enable OpenAPI schema exposure.
  4. Place the following configuration snippet into your application's specific configuration file:
  5. macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  6. Windows: %AppData%\Claude\claude_desktop_config.json

{ "mcpServers": { "browser-use": { "command": "browser-use-mcp", "args": ["--model", "claude-3-opus-20240229"] } } }

  1. Initiate a new dialogue session within Claude and issue directives for web operations.

Supported Model Integrations

The following Language Model endpoints are compatible with this browser automation service:

Provider Required API Key Environment Variable(s)
OpenAI OPENAI_API_KEY
Anthropic ANTHROPIC_API_KEY
Google GOOGLE_API_KEY
Cohere COHERE_API_KEY
Mistral AI MISTRAL_API_KEY
Groq GROQ_API_KEY
Together AI TOGETHER_API_KEY
AWS Bedrock AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
Fireworks FIREWORKS_API_KEY
Azure OpenAI AZURE_OPENAI_API_KEY and AZURE_OPENAI_ENDPOINT
Vertex AI (Google) GOOGLE_APPLICATION_CREDENTIALS
NVIDIA NVIDIA_API_KEY
AI21 Labs AI21_API_KEY
Databricks DATABRICKS_HOST and DATABRICKS_TOKEN
IBM watsonx.ai WATSONX_API_KEY
xAI XAI_API_KEY
Upstage UPSTAGE_API_KEY
Hugging Face HUGGINGFACE_API_KEY
Ollama (Local) OLLAMA_BASE_URL
Llama.cpp (Local) LLAMA_CPP_SERVER_URL

Consult the official LangChain documentation for further integration details: https://python.langchain.com/docs/integrations/chat/

Configuration of credentials can be centralized by creating a .env file in your project root:

OPENAI_API_KEY=your_openai_key_here

Or include the key for any other supported backend

Diagnostic and Resolution Guide

  • Authentication Failures: Verify that the necessary secret key for the selected provider is correctly established within your operating environment variables or the .env file.
  • Provider Unavailability: Confirm that the specific package corresponding to your chosen model provider has been installed.
  • Browser Automation Failures: Execute playwright install chromium to ensure the required browser binaries are present.
  • Model Specification Errors: If the service rejects the model name, explicitly assign a recognized model identifier using the --model flag during server invocation.
  • Debugging Verbosity: Activate detailed logging output by including the --debug flag when launching the server process.
  • Client Setup Mismatch: Double-check that the command string and environment variable mapping in your client configuration precisely match the server's requirements.

Licensing

MIT # browser-use-mcp WIKIPEDIA: A headless browser operates as a web browser but lacks a graphical presentation layer. This mode facilitates the programmatic control of web page content through command-line interfaces or network protocols, closely mimicking the rendering capabilities of standard browsers, including CSS styling, JavaScript execution, and AJAX handling, which is often absent in simpler parsing tools. Modern browser engines (Chrome 59+, Firefox 56+) natively support this remote control functionality, superseding older solutions like PhantomJS.

== Primary Applications == The core use cases for operating browsers in a headless configuration include:

  • Automated quality assurance workflows for contemporary web applications (Web Testing).
  • Generating static snapshots (screenshots) of rendered web pages.
  • Executing automated test suites targeting client-side JavaScript functionality.
  • Orchestrating complex interactions across web interfaces.

=== Secondary Applications === Headless environments are also valuable for advanced web data harvesting (scraping). Furthermore, they were identified as a method to help search engines index content reliant on Ajax rendering. Conversely, misuse cases include launching distributed denial-of-service (DDoS) attacks, artificially inflating advertisement visibility metrics, or performing unauthorized automated site manipulation (e.g., credential stuffing). However, empirical traffic analysis suggests that malicious actors do not disproportionately favor headless browsers over standard ones for common attacks.

== Control Frameworks == Given that major browser vendors now offer native headless APIs, several unified software interfaces exist to manage this automation layer:

  • Selenium WebDriver: Adheres to the W3C WebDriver specification for cross-browser automation.
  • Playwright: A robust Node.js library supporting Chromium, Firefox, and WebKit.
  • Puppeteer: Primarily focused on automating Chrome or Firefox instances via Node.js.

=== Testing Integration === Several established testing frameworks incorporate headless browser capabilities into their apparatus:

  • Capybara: Leverages Headless Chrome or WebKit to simulate end-user actions during testing.
  • Jasmine: Defaults to Selenium but can be configured to utilize WebKit or Headless Chrome for environment execution.
  • Cypress: A dedicated framework for front-end testing that supports headless operation.
  • QF-Test: A tool for GUI-based automated testing that supports headless browser execution.

=== Alternative Approaches === An alternative strategy involves employing libraries that simulate browser APIs without launching a full rendering engine. For instance, Deno natively integrates certain browser APIs. In the Node.js ecosystem, jsdom provides the most comprehensive simulation of HTML parsing, cookie management, XHR requests, and partial JavaScript execution. While these alternatives are often faster, they typically lack full DOM rendering capabilities and exhibit limited support for complex DOM events compared to genuine headless instances.

See Also

`