BrowserAgent-Interface
Facilitate smooth engagement with sophisticated AI entities through an intuitive graphical front-end, supporting a diverse array of Large Language Models while preserving active browser contexts across sequential operations for enhanced productivity. The backend system incorporates capabilities for premium-quality visual session capture and bespoke browser modifications without necessitating repetitive sign-in procedures.
Author

Muzzera
Quick Info
Actions
Tags
This utility builds upon the core framework of browser-use, which is engineered to grant AI agents access to interactive web environments.
We extend our sincere gratitude to WarmShao for their critical contributions to this endeavor.
The WebUI: Implemented using Gradio, this interface incorporates the vast majority of the operational features provided by browser-use. It is crafted for ease of use, simplifying the process of directing the browser agent.
Broadened LLM Compatibility: We have introduced native integration for numerous large-scale language models, including: Gemini, OpenAI offerings, Azure OpenAI endpoints, Anthropic models, DeepSeek, Ollama, and others. Future iterations are slated to incorporate even more model backends.
Personalized Browser Configuration: Users possess the option to leverage their pre-configured browser instance, thereby bypassing the annoyance of repeated login mandates or session token management. This feature also encompasses the capture of high-fidelity screen recordings.
Sustained Browser State: A selectable option allows the underlying browser instance to remain active following the completion of an AI directive, ensuring visibility into the cumulative history and current state of all agent interactions.
Deployment Methods
Method 1: On-Premise Setup
Consult the concise setup manual or adhere to the subsequent sequence of commands to initiate operation.
Requires Python version 3.11 or later.
Initially, we advise utilizing the uv tool for establishing the Python virtual environment.
bash uv venv --python 3.11
Activate the environment with:
bash source .venv/bin/activate
Install necessary package dependencies:
bash uv pip install -r requirements.txt
Then, install the Playwright browser drivers:
bash playwright install
Method 2: Containerized Deployment (Docker)
- Prerequisites Checklist:
- Docker Engine and Docker Compose installed.
-
Git utility for repository cloning.
-
Preparation Steps: bash # Obtain the repository source code git clone https://github.com/browser-use/web-ui.git cd web-ui
# Duplicate and configure environment variables file cp .env.example .env # Open .env in your preferred editor to input necessary API credentials
- Execution via Docker: bash # Build the image and launch the container (browser terminates after tasks) docker compose up --build
# Alternatively, launch with persistent browser mode (browser remains active) CHROME_PERSISTENT_SESSION=true docker compose up --build
- Access Endpoints:
- Web User Interface:
http://localhost:7788 - Remote Desktop Viewer (for observing browser operations):
http://localhost:6080/vnc.html
The default VNC passcode is "vncpassword". This can be customized by setting the VNC_PASSWORD variable within your .env file.
Operational Guide
Local Environment Configuration
- Transfer
.env.exampleto.envand populate required variables, especially LLM authentication tokens. (cp .env.example .env) -
Start the User Interface: bash python webui.py --ip 127.0.0.1 --port 7788
-
WebUI Initialization Parameters:
--ip: Specifies the network interface binding address. Default:127.0.0.1.--port: Specifies the network port. Default:7788.--theme: Dictates the visual styling of the interface. Default:Ocean.- Default: Standard presentation with balanced visual elements.
- Soft: A subdued, gentle color palette for low-strain viewing.
- Monochrome: A pure grayscale mode prioritizing task focus.
- Glass: A contemporary, translucent aesthetic.
- Origin: A throwback design evoking vintage computing styles.
- Citrus: An energetic scheme featuring bright, refreshing hues.
- Ocean (default): A calming, blue-centric design.
--dark-mode: Activates a dark visual schema for the interface.- Access Point: Navigate to
http://127.0.0.1:7788in your chosen web navigator. -
Integrating Custom Browser (Optional):
- Define
CHROME_PATHwith the executable location andCHROME_USER_DATAwith the user profile directory. IfCHROME_USER_DATAis omitted, the tool defaults to local profile data. -
Windows Example: env CHROME_PATH="C:\Program Files\Google\Chrome\Application\chrome.exe" CHROME_USER_DATA="C:\Users\YourUsername\AppData\Local\Google\Chrome\User Data"
Note: Substitute
YourUsernamewith the actual account name on Windows systems. - Mac Example: env CHROME_PATH="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" CHROME_USER_DATA="/Users/YourUsername/Library/Application Support/Google/Chrome" -
Ensure all instances of Chrome are closed.
- Access the WebUI via a browser other than Chrome (e.g., Firefox or Edge). This separation is vital as the persistent session utilizes the specified Chrome data path.
- Enable the "Use Own Browser" checkbox within the Browser configuration panel.
- Maintaining Browser Lifespan (Optional):
- Set the environment variable
CHROME_PERSISTENT_SESSION=truewithin the.envconfiguration file.
- Define
Docker Deployment Configuration
- Configuration Variables:
- All settings are managed via the
.envfile. -
Available environment variables for customization:
# AI Model Credentials OPENAI_API_KEY=your_key_here ANTHROPIC_API_KEY=your_key_here GOOGLE_API_KEY=your_key_here
# Browser Parameters CHROME_PERSISTENT_SESSION=true # True maintains browser between tasks RESOLUTION=1920x1080x24 # Custom screen size format: WxHxD (Depth is usually 24) RESOLUTION_WIDTH=1920 # Custom width in pixels RESOLUTION_HEIGHT=1080 # Custom height in pixels
# VNC Connection Settings VNC_PASSWORD=your_vnc_password # Optional; defaults to "vncpassword"
-
Browser Session States:
-
Standard Execution (CHROME_PERSISTENT_SESSION=false):
- Browser instance initiates and terminates with every AI processing job.
- Ensures a clean operational slate for each request.
- Lower overall system resource consumption.
-
Continuous Execution (CHROME_PERSISTENT_SESSION=true):
- The browser remains active throughout the application lifecycle.
- Preserves operational history and session state across multiple commands.
- Allows for post-task inspection of agent activities.
- Configured in
.envor as an environment flag during container startup.
-
Monitoring Browser Activity:
- Connect to the noVNC remote interface at
http://localhost:6080/vnc.html. - Input the configured VNC passcode (default: "vncpassword" or the value from
VNC_PASSWORD). -
You can now observe all automated browser actions synchronously.
-
Container Lifecycle Management: bash # Start with persistent session active CHROME_PERSISTENT_SESSION=true docker compose up -d
# Start with default (ephemeral) mode docker compose up -d
# Stream real-time operational logs docker compose logs -f
# Terminate and remove resources docker compose down
Revision History
- [x] 2025/01/26: Integration update thanks to @vvincent1234. Enables the browser-use-webui stack to collaborate effectively with DeepSeek-r1 for complex reasoning tasks!
- [x] 2025/01/10: Feature addition credited to @casistack. Introduces robust Docker deployment options and the functionality to retain the browser session between operational cycles.Video demonstration.
- [x] 2025/01/06: Major UI overhaul completed by @richard-devbot. Launch of a newly designed, streamlined WebUI experience. Video demonstration.
WIKIPEDIA: A headless browser is a web browser without a graphical user interface. Headless browsers provide automated control of a web page in an environment similar to popular web browsers, but they are executed via a command-line interface or using network communication. They are particularly useful for testing web pages as they are able to render and understand HTML the same way a browser would, including styling elements such as page layout, color, font selection and execution of JavaScript and Ajax which are usually not available when using other testing methods. Since version 59 of Google Chrome and version 56 of Firefox, there is native support for remote control of the browser. This made earlier efforts obsolete, notably PhantomJS.
== Use cases == The primary applications for employing headless browsing environments include:
Web application regression and functional validation (web testing). Automated generation of static page captures (screenshots). Execution of unit tests targeting JavaScript libraries. Programmatic orchestration of web page interactions.
=== Secondary Applications === Headless agents are also frequently utilized for web data harvesting. Google publicly noted in 2009 that using a headless agent could enhance the indexing capabilities for content generated dynamically via Ajax. Conversely, headless agents have been leveraged for undesirable activities such as:
Initiating denial-of-service attacks against web targets. Artificially inflating advertising impression counts. Automating site usage in ways that violate terms of service, such as credential stuffing. However, a comprehensive traffic analysis conducted in 2018 indicated no discernible pattern where malicious actors favored headless over traditional browser interfaces. Evidence does not suggest headless browsers are disproportionately responsible for security exploits like DDoS actions, SQL injections, or cross-site scripting vulnerabilities.
== Implementation == As several major browser engines now natively support headless operation via established APIs, various software libraries exist to manage browser automation through standardized interfaces. These encompass:
Selenium WebDriver – Adheres to the W3C WebDriver protocol specification. Playwright – A library for Node.js designed to automate Chromium, Firefox, and WebKit engines. Puppeteer – A Node.js interface specifically for controlling Chrome or Firefox instances.
=== Validation Automation === Numerous testing frameworks incorporate headless browser capabilities into their operational apparatus.
Capybara leverages headless browsing, utilizing either WebKit or Headless Chrome engines to simulate user activity within its testing protocols. Jasmine defaults to Selenium but permits configuration for WebKit or Headless Chrome for browser-based test execution. Cypress, a framework dedicated to front-end testing. QF-Test, a commercial testing suite that supports GUI-level automation, including the use of headless browsing agents.
=== Alternative Methodologies === An alternative strategy involves employing libraries that emulate browser functionality through specific APIs. For instance, Deno integrates browser APIs directly into its architecture. For Node.js environments, jsdom offers the most comprehensive simulation. While most of these alternatives successfully handle fundamental browser features (HTML parsing, cookie management, XHR requests, limited JavaScript), they generally lack true DOM rendering capabilities and have constrained event model support. Consequently, they typically execute faster than full browser environments.
