This utility builds upon the core framework of browser-use, which is engineered to grant AI agents access to interactive web environments.

We extend our sincere gratitude to WarmShao for their critical contributions to this endeavor.

The WebUI: Implemented using Gradio, this interface incorporates the vast majority of the operational features provided by browser-use. It is crafted for ease of use, simplifying the process of directing the browser agent.

Broadened LLM Compatibility: We have introduced native integration for numerous large-scale language models, including: Gemini, OpenAI offerings, Azure OpenAI endpoints, Anthropic models, DeepSeek, Ollama, and others. Future iterations are slated to incorporate even more model backends.

Personalized Browser Configuration: Users possess the option to leverage their pre-configured browser instance, thereby bypassing the annoyance of repeated login mandates or session token management. This feature also encompasses the capture of high-fidelity screen recordings.

Sustained Browser State: A selectable option allows the underlying browser instance to remain active following the completion of an AI directive, ensuring visibility into the cumulative history and current state of all agent interactions.

Deployment Methods

Method 1: On-Premise Setup

Consult the concise setup manual or adhere to the subsequent sequence of commands to initiate operation.

Requires Python version 3.11 or later.

Initially, we advise utilizing the uv tool for establishing the Python virtual environment.

bash uv venv --python 3.11

Activate the environment with:

bash source .venv/bin/activate

Install necessary package dependencies:

bash uv pip install -r requirements.txt

Then, install the Playwright browser drivers:

bash playwright install

Method 2: Containerized Deployment (Docker)

Prerequisites Checklist:
Docker Engine and Docker Compose installed.
Git utility for repository cloning.
Preparation Steps: bash # Obtain the repository source code git clone https://github.com/browser-use/web-ui.git cd web-ui

# Duplicate and configure environment variables file cp .env.example .env # Open .env in your preferred editor to input necessary API credentials

Execution via Docker: bash # Build the image and launch the container (browser terminates after tasks) docker compose up --build

# Alternatively, launch with persistent browser mode (browser remains active) CHROME_PERSISTENT_SESSION=true docker compose up --build

Access Endpoints:
Web User Interface: http://localhost:7788
Remote Desktop Viewer (for observing browser operations): http://localhost:6080/vnc.html

The default VNC passcode is "vncpassword". This can be customized by setting the VNC_PASSWORD variable within your .env file.

Operational Guide

Local Environment Configuration

Transfer .env.example to .env and populate required variables, especially LLM authentication tokens. (cp .env.example .env)
Start the User Interface: bash python webui.py --ip 127.0.0.1 --port 7788
WebUI Initialization Parameters:
--ip: Specifies the network interface binding address. Default: 127.0.0.1.
--port: Specifies the network port. Default: 7788.
--theme: Dictates the visual styling of the interface. Default: Ocean.
- Default: Standard presentation with balanced visual elements.
- Soft: A subdued, gentle color palette for low-strain viewing.
- Monochrome: A pure grayscale mode prioritizing task focus.
- Glass: A contemporary, translucent aesthetic.
- Origin: A throwback design evoking vintage computing styles.
- Citrus: An energetic scheme featuring bright, refreshing hues.
- Ocean (default): A calming, blue-centric design.
--dark-mode: Activates a dark visual schema for the interface.
Access Point: Navigate to http://127.0.0.1:7788 in your chosen web navigator.
Integrating Custom Browser (Optional):
- Define CHROME_PATH with the executable location and CHROME_USER_DATA with the user profile directory. If CHROME_USER_DATA is omitted, the tool defaults to local profile data.
- Windows Example: env CHROME_PATH="C:\Program Files\Google\Chrome\Application\chrome.exe" CHROME_USER_DATA="C:\Users\YourUsername\AppData\Local\Google\Chrome\User Data"
  
  Note: Substitute YourUsername with the actual account name on Windows systems. - Mac Example: env CHROME_PATH="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" CHROME_USER_DATA="/Users/YourUsername/Library/Application Support/Google/Chrome"
- Ensure all instances of Chrome are closed.
- Access the WebUI via a browser other than Chrome (e.g., Firefox or Edge). This separation is vital as the persistent session utilizes the specified Chrome data path.
- Enable the "Use Own Browser" checkbox within the Browser configuration panel.
- Maintaining Browser Lifespan (Optional):
- Set the environment variable CHROME_PERSISTENT_SESSION=true within the .env configuration file.

Docker Deployment Configuration

Configuration Variables:
All settings are managed via the .env file.
Available environment variables for customization:

# AI Model Credentials OPENAI_API_KEY=your_key_here ANTHROPIC_API_KEY=your_key_here GOOGLE_API_KEY=your_key_here

# Browser Parameters CHROME_PERSISTENT_SESSION=true # True maintains browser between tasks RESOLUTION=1920x1080x24 # Custom screen size format: WxHxD (Depth is usually 24) RESOLUTION_WIDTH=1920 # Custom width in pixels RESOLUTION_HEIGHT=1080 # Custom height in pixels

# VNC Connection Settings VNC_PASSWORD=your_vnc_password # Optional; defaults to "vncpassword"
Browser Session States:
Standard Execution (CHROME_PERSISTENT_SESSION=false):
- Browser instance initiates and terminates with every AI processing job.
- Ensures a clean operational slate for each request.
- Lower overall system resource consumption.
Continuous Execution (CHROME_PERSISTENT_SESSION=true):
- The browser remains active throughout the application lifecycle.
- Preserves operational history and session state across multiple commands.
- Allows for post-task inspection of agent activities.
- Configured in .env or as an environment flag during container startup.
Monitoring Browser Activity:
Connect to the noVNC remote interface at http://localhost:6080/vnc.html.
Input the configured VNC passcode (default: "vncpassword" or the value from VNC_PASSWORD).
You can now observe all automated browser actions synchronously.
Container Lifecycle Management: bash # Start with persistent session active CHROME_PERSISTENT_SESSION=true docker compose up -d

# Start with default (ephemeral) mode docker compose up -d

# Stream real-time operational logs docker compose logs -f

# Terminate and remove resources docker compose down

Revision History

[x] 2025/01/26: Integration update thanks to @vvincent1234. Enables the browser-use-webui stack to collaborate effectively with DeepSeek-r1 for complex reasoning tasks!
[x] 2025/01/10: Feature addition credited to @casistack. Introduces robust Docker deployment options and the functionality to retain the browser session between operational cycles.Video demonstration.
[x] 2025/01/06: Major UI overhaul completed by @richard-devbot. Launch of a newly designed, streamlined WebUI experience. Video demonstration.

WIKIPEDIA: A headless browser is a web browser without a graphical user interface. Headless browsers provide automated control of a web page in an environment similar to popular web browsers, but they are executed via a command-line interface or using network communication. They are particularly useful for testing web pages as they are able to render and understand HTML the same way a browser would, including styling elements such as page layout, color, font selection and execution of JavaScript and Ajax which are usually not available when using other testing methods. Since version 59 of Google Chrome and version 56 of Firefox, there is native support for remote control of the browser. This made earlier efforts obsolete, notably PhantomJS.

== Use cases == The primary applications for employing headless browsing environments include:

Web application regression and functional validation (web testing). Automated generation of static page captures (screenshots). Execution of unit tests targeting JavaScript libraries. Programmatic orchestration of web page interactions.

=== Secondary Applications === Headless agents are also frequently utilized for web data harvesting. Google publicly noted in 2009 that using a headless agent could enhance the indexing capabilities for content generated dynamically via Ajax. Conversely, headless agents have been leveraged for undesirable activities such as:

Initiating denial-of-service attacks against web targets. Artificially inflating advertising impression counts. Automating site usage in ways that violate terms of service, such as credential stuffing. However, a comprehensive traffic analysis conducted in 2018 indicated no discernible pattern where malicious actors favored headless over traditional browser interfaces. Evidence does not suggest headless browsers are disproportionately responsible for security exploits like DDoS actions, SQL injections, or cross-site scripting vulnerabilities.

== Implementation == As several major browser engines now natively support headless operation via established APIs, various software libraries exist to manage browser automation through standardized interfaces. These encompass:

Selenium WebDriver – Adheres to the W3C WebDriver protocol specification. Playwright – A library for Node.js designed to automate Chromium, Firefox, and WebKit engines. Puppeteer – A Node.js interface specifically for controlling Chrome or Firefox instances.

=== Validation Automation === Numerous testing frameworks incorporate headless browser capabilities into their operational apparatus.

Capybara leverages headless browsing, utilizing either WebKit or Headless Chrome engines to simulate user activity within its testing protocols. Jasmine defaults to Selenium but permits configuration for WebKit or Headless Chrome for browser-based test execution. Cypress, a framework dedicated to front-end testing. QF-Test, a commercial testing suite that supports GUI-level automation, including the use of headless browsing agents.

=== Alternative Methodologies === An alternative strategy involves employing libraries that emulate browser functionality through specific APIs. For instance, Deno integrates browser APIs directly into its architecture. For Node.js environments, jsdom offers the most comprehensive simulation. While most of these alternatives successfully handle fundamental browser features (HTML parsing, cookie management, XHR requests, limited JavaScript), they generally lack true DOM rendering capabilities and have constrained event model support. Consequently, they typically execute faster than full browser environments.