mcp-web-navigator
Server and command-line utility for AI-orchestrated web interaction, facilitating programmatic control over digital browser environments for testing, data acquisition, and digital research workflows.
Author

Saik0s
Quick Info
Actions
Tags
WebNavigator MCP Endpoint & CLI
Foundation Note: This execution component builds upon the architecture established by the [browser-use/web-ui] project, adapting its core automation protocols and configuration schemes.
An implementation of the Model Context Protocol (MCP) centered around autonomous browser manipulation, driven by natural language prompts. It exposes server capabilities and direct command-line tooling.
Core Capabilities
- 🧠 Protocol Adherence - Complete integration with the Model Context Protocol for Agent communication.
- 🌐 Browser Execution - Orchestrates navigation, user input submission, and interactive element manipulation via plain language directives (via the
execute_browser_tasktool). - 👁️ Visual Context - Enables optional analysis of rendered page state (screenshots) for multimodal Language Models.
- 🔄 Session Management - Supports persistent browser contexts spanning multiple requests or direct attachment to user-controlled browser sessions.
- 🔌 Provider Agnostic - Seamless interoperability with a wide spectrum of Large Language Model APIs, including OpenAI, Anthropic, Google, Ollama, and others.
- 🔍 In-Depth Synthesis - A specialized function (
initiate_web_synthesis) for multi-stage research synthesis and structured output generation. - ⚙️ Configuration Flexibility - Fully parameterizable through environment variables, managed via an underlying Pydantic schema.
- 🔗 CDP Interface - Capability to bind to an externally launched Chromium instance utilizing the Chrome DevTools Protocol endpoint.
- ⌨️ CLI Accessibility - Direct access to primary automation functions (
execute_browser_task,initiate_web_synthesis) from the terminal for scripting validation.
Initial Setup
Prerequisites
-
Install UV (Python environment manager):
curl -LsSf https://astral.sh/uv/install.sh | sh -
Acquire necessary browser binaries via Playwright:
uvx --from mcp-web-navigator@latest python -m playwright install
Integration Schema
For MCP client applications (like desktop assistants), establish the server connection via a configuration snippet, such as:
// Configuration Snippet A: Minimal Deployment "mcpServers": { "web-navigator": { "command": "uvx", "args": ["mcp-web-navigator@latest"], "env": { "NAV_LLM_GOOGLE_API_KEY": "YOUR_KEY_HERE_IF_USING_GOOGLE", "NAV_LLM_PROVIDER": "google", "NAV_LLM_MODEL": "gemini-2.5-flash-preview-04-17", "NAV_BROWSER_IS_HEADLESS": "true", } } }
(Further detailed configuration examples demonstrating CDP integration and extensive environment variable overrides are present in the original project documentation, mirroring the complexity shown in the source repository's README.)
Configuration Insight: Begin with the simplest setup (Snippet A). The comprehensive list of all adjustable parameters resides in the companion .env.example manifest.
Exposed MCP Functionality
This service publishes the following functions via the Model Context Protocol:
Synchronous Operations (Blocking Call)
-
execute_browser_task- Definition: Carries out a web interaction sequence dictated by natural language input, awaiting final status. Configuration draws from
NAV_AGENT_TOOL_*,NAV_LLM_*, andNAV_BROWSER_*prefixes. - Inputs:
instruction_set(text, mandatory): The objective or command sequence.
- Output: (text) The conclusive result obtained by the agent, or an error report. History data (JSON, optional short video) is archived if
NAV_AGENT_TOOL_HISTORY_DIRis specified.
- Definition: Carries out a web interaction sequence dictated by natural language input, awaiting final status. Configuration draws from
-
initiate_web_synthesis- Definition: Executes comprehensive, multi-step web exploration on a nominated subject, resulting in a synthesized report. Configuration uses
NAV_RESEARCH_TOOL_*,NAV_LLM_*, andNAV_BROWSER_*settings. Outputs are directed to a task-specific subdirectory underNAV_RESEARCH_TOOL_OUTPUT_ROOTif defined; otherwise, processing remains ephemeral (in-memory). - Inputs:
research_topic(text, mandatory): The subject matter for deep analysis.max_concurrent_sessions(number, optional): Overrides the default defined in environment variables.
- Output: (text) The final research document rendered in Markdown format, including any persistent file reference, or an execution failure notice.
- Definition: Executes comprehensive, multi-step web exploration on a nominated subject, resulting in a synthesized report. Configuration uses
Terminal Interface (web-navigator-cli)
This package concurrently installs a utility, web-navigator-cli, enabling direct invocation of core logic outside the MCP server context.
Global Modifiers:
* --config-file PATH, -c PATH: Specify a file containing environment overrides.
* --verbosity LEVEL, -v LEVEL: Adjust runtime reporting level (e.g., TRACE, INFO).
Commands:
-
web-navigator-cli perform-action [PARAMETERS] INSTRUCTION- Purpose: Executes a single browser interaction procedure.
- Argument:
INSTRUCTION(text, required): The action the agent must take.
- Example: bash web-navigator-cli perform-action "Load the documentation page and extract all H2 headings." -c .env.config
-
web-navigator-cli synthesize-data [PARAMETERS] TOPIC- Purpose: Initiates background web synthesis.
- Argument:
TOPIC(text, required): The subject to research.
- Options:
--concurrency INTEGER, -x INTEGER: Sets the parallelism limit for this run.
- Example: bash web-navigator-cli synthesize-data "Recent developments in quantum computing architectures." --concurrency 5 -c .env.config
All auxiliary settings (API credentials, path mappings, browser rendering properties) are sourced from environment variables or the specified configuration file, as detailed in the Configuration Reference section below.
Configuration Reference (Environment Variables)
Configuration employs prefixed environment variables for logical grouping.
| Variable Group (Prefix) | Example Variable | Functionality Summary | Default Setting |
|---|---|---|---|
Core LLM (NAV_LLM_) |
Parameters governing the principal reasoning engine. | ||
NAV_LLM_PROVIDER |
The remote LLM service provider. | openai |
|
NAV_LLM_MODEL |
The designated model identifier for inference. | gpt-4.1 |
|
NAV_LLM_TEMPERATURE |
Stochasticity control (0.0 to 2.0). | 0.0 |
|
Browser Control (NAV_BROWSER_) |
Settings affecting the underlying browser engine (Playwright/CDP). | ||
NAV_BROWSER_IS_HEADLESS |
Execute browser operations invisibly. | false |
|
NAV_BROWSER_CDP_TARGET |
URL for Chrome DevTools Protocol attachment point (for external browser linkage). | - | |
NAV_BROWSER_VIEWPORT_W |
Initial rendering width in pixels. | 1280 |
|
NAV_BROWSER_PERSIST_SESSION |
Maintain the browser instance state between separate server requests. | false |
|
Agent Task (NAV_AGENT_TOOL_) |
Fine-tuning parameters for the immediate task execution tool. | ||
NAV_AGENT_TOOL_MAX_CYCLES |
Upper bound on decision/action loops per run. | 100 |
|
NAV_AGENT_TOOL_ENABLE_VISUAL_INPUT |
Activates analysis of screen captures by the LLM. | true |
|
NAV_AGENT_TOOL_HISTORY_ARCHIVE_DIR |
Location to persistently store execution logs and trace data. | (Disabled) |
|
Synthesis (NAV_RESEARCH_TOOL_) |
Parameters for the deep research function. | ||
NAV_RESEARCH_TOOL_OUTPUT_ROOT |
Root directory for saving final reports. If null, results are returned solely in memory. | None |
|
System Paths (NAV_PATHS_) |
Global file system locations managed by the service. | ||
NAV_PATHS_ARTIFACT_CACHE |
Designated directory for temporary file storage or downloads. | (Disabled) |
|
Service Operation (NAV_SERVER_) |
Settings related to server process management and reporting. | ||
NAV_SERVER_VERBOSITY_LEVEL |
Detail level for internal logging output. | ERROR |
Enumerated LLM Providers:
openai, azure, anthropic, google, mistral, ollama, deepseek, openrouter, alibaba, moonshot, unbound
External Browser Binding (CDP Mode)
To detach from server-managed browser instances and instead interface with a user-initiated session:
-
Start Chromium: Execute Chrome/Chromium with the remote debugging flag: (
<executable> --remote-debugging-port=9222) -
Configure Environment: Set variables to point to this running instance: dotenv NAV_BROWSER_USE_OWN_SESSION=true NAV_BROWSER_CDP_TARGET=http://localhost:9222
-
Initiate Service: Launch the server or CLI as typical.
Caveat: When NAV_BROWSER_USE_OWN_SESSION=true, parameters controlling server-managed browser behavior (like headless mode or keep-alive) are disregarded.
Development Workflow
bash
Initialize dependencies
uv sync --dev
Install Playwright dependencies
uv run playwright install
Example of running the inspector tool using environment variables for an external CDP session:
npx @modelcontextprotocol/inspector@latest \ -e NAV_LLM_GOOGLE_API_KEY=$GOOGLE_API_KEY \ -e NAV_LLM_PROVIDER=google \ -e NAV_BROWSER_USE_OWN_SESSION=true \ -e NAV_BROWSER_CDP_TARGET=http://localhost:9222 \ uv --directory . run mcp-web-navigator
Execute a CLI agent test (ensure a config file or environment variables are present)
uv run web-navigator-cli perform-action "Verify the primary H1 tag on https://www.wikipedia.org/" -c .env.config
Diagnostics and Error Resolution
- Startup Failure (Missing Parameter): Verify all system requirements dictated by the active configuration schema are satisfied, particularly mandatory paths for research output (
NAV_RESEARCH_TOOL_OUTPUT_ROOT). - CDP Binding Failures: Confirm that the browser executable was initiated with the necessary
--remote-debugging-portflag active and accessible at the specifiedNAV_BROWSER_CDP_TARGETaddress. - API Key Errors: Revalidate the credentials stored in the environment variables (e.g.,
NAV_LLM_OPENAI_API_KEY). - Data Persistence Failure: If history logging, tracing, or file downloads are not recorded, ensure the corresponding path variables are set and the execution context possesses the necessary write permissions for those directories.
- Troubleshooting Logging: Increase the service logging verbosity via
NAV_SERVER_VERBOSITY_LEVELtoTRACEfor granular insights.
Licensing
Distributed under the terms of the MIT License. See the [LICENSE] file for specifics.
WIKIPEDIA CONTEXT: Headless web browsing facilitates automated interaction with web interfaces without rendering a graphical display, primarily serving robust applications in software validation, web content capture, and systematic data harvesting. This paradigm has largely superseded older, non-standardized automation methods due to native support for headless modes emerging in contemporary browser engines like Chromium and Firefox via the WebDriver standard.
