@suthio/brave-deep-research-mcp: Advanced Web Content Harvesting Service

A Model Context Protocol (MCP) implementation meticulously integrating Brave Search capabilities with sophisticated, Puppeteer-driven page traversal and content extraction mechanisms to support profound investigative research tasks. This mechanism empowers sophisticated AI agents to execute thorough web investigations, moving past mere result listings to ingest complete page bodies and recursively follow navigational links.

Differentiation from the Baseline Brave Search Connector

Baseline Brave Search MCP Connector:

Query Execution: Leverages the native Brave Search API for fundamental lookups.
Data Output: Delivers solely the metadata (designation, URI, and introductory text) furnished by the search endpoint.
Contextual Fidelity: Zero capacity to access the full textual substance of a discovered resource beyond the provided abstract.
Link Following: Incapable of navigating to external pages or traversing hypertext links.
Informational Breadth: Constrained strictly to the cursory data points present in the initial search index response.
Post-Retrieval Processing: Lacks any functionality for content segmentation or purification.
Parameter Flexibility: Restricted to elementary search controls (the search term, quantity, and pagination offset).
Applicability: Optimal for rapid, high-level information gathering where summaries suffice.

Advanced Brave Deep Research MCP Connector (This Implementation):

Query Execution: Initiates with Brave Search, followed by augmentation via automated web scraping.
Data Output: Systematically captures and archives the entire document body from every resultant URI.
Contextual Fidelity: Provides exhaustive document content, prioritizing the extraction of primary narrative text.
Link Following: Possesses the ability to traverse embedded hyperlinks up to a user-defined recursive depth.
Informational Breadth: Attains holistic comprehension across a network of interconnected, relevant web resources.
Post-Retrieval Processing: Employs intelligent heuristics to isolate and extract germane content, discarding boilerplate elements like advertisements, navigation bars, and footers.
Parameter Flexibility: Offers granular control over traversal depth, result cap, browser visibility state, and operation timeouts.
Applicability: Essential for complex research endeavors demanding detailed context and verified source material.

Illustrative Divergence in a Sample Inquiry

For the directive: "novel approaches to sustainable urban cooling infrastructure":

Baseline Brave Search MCP Response Format (Abstract Only):

Title: "Emerging Trends in City Cooling Solutions - Research Hub" URL: "https://research.org/urban-cool-tech" Snippet: "New passive radiative cooling materials are showing promise for mitigating urban heat island effects in dense metropolitan areas..."

Advanced Brave Deep Research MCP Response Format (Content Ingestion):

Emerging Trends in City Cooling Solutions - Research Hub

URI: https://research.org/urban-cool-tech

Primary Document Body

Passive radiative cooling films, utilizing spectrally selective surface coatings, have demonstrated an ability to reduce surface temperatures by up to 15°C relative to ambient conditions in simulated desert environments. Key challenges involve material durability against particulate accumulation and scalability for mass infrastructure deployment...

[Extensive, processed textual data follows, potentially including content retrieved from linked case studies]

Core Capabilities

Investigative Depth: Penetrates the surface layer of search results to ingest complete document payloads.
Configurable Recursion: Allows specification of the hierarchical level of hyperlink exploration.
Content Isolation: Sophisticated algorithms identify and retain the substantive informational core.
Attribution Capture: Obtains contextual metadata, including canonical titles and introductory narratives.
Diagnostic Logging: Toggleable verbose output for troubleshooting environments.
Headless Mode Control: Switchable execution mode for the underlying browser engine.

Deployment Guide

bash

Global installation via Node Package Manager

npm install -g @suthio/brave-deep-research-mcp

Alternatively, obtain source code

git clone https://github.com/suthio/brave-deep-research-mcp.git cd brave-deep-research-mcp npm install npm run build

Initialization Parameters

Establish a configuration file named .env by mirroring the provided template (.env.example):

bash cp .env.example .env

Modify the file to input your Brave Search credential and operational settings

ano .env

Environment Variables Reference

BRAVE_API_KEY: Essential credential for accessing the Brave Search indexing service.
PUPPETEER_HEADLESS: Boolean flag indicating whether Puppeteer should run without a visible graphical interface (default: true).
PAGE_TIMEOUT: Maximum duration, in milliseconds, permitted for a single web page to finalize loading (default: 30000).
DEBUG_MODE: Activates extensive internal logging for diagnostic review (default: false).

Operational Procedures

Execution via Command Line Interface

bash

If installed globally

brave-deep-research-mcp

Execute via npx runner

npx @suthio/brave-deep-research-mcp

Run directly from the cloned source directory

npm start

Integration with Claude Desktop Environment

Install the package globally: bash npm install -g @suthio/brave-deep-research-mcp
Modify the appropriate configuration JSON file for Claude Desktop:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Insert the following structure into the mcpServers segment:

{ "mcpServers": { "brave-deep-research": { "command": "npx", "args": ["@suthio/brave-deep-research-mcp"], "env": { "BRAVE_API_KEY": "your_brave_api_key_here", "PUPPETEER_HEADLESS": "true" } } } }

Relaunch the Claude application.
The specialized deep-search utility becomes available for use in subsequent interactions.

Example Interaction Prompts

"Engage deep-search to investigate cutting-edge advancements in superconducting quantum processors."
"Initiate a deep search concerning global strategies for carbon capture, specifying traversal depth set to 2."
"Deep search targeting documentation on passive solar building design, limited to the first 5 indexed results."

Tool Argument Specification

The invoked deep-search function accepts the subsequent arguments:

query (Mandatory): The string term submitted for web investigation.
results (Optional): The ceiling on the number of initial search hits to process (Default: 3; Maximum: 10).
depth (Optional): The maximum level of recursive hyperlink analysis to perform (Default: 1; Maximum: 3).

Development Workflow

bash

Obtain the source code repository

git clone https://github.com/suthio/brave-deep-research-mcp.git cd brave-deep-research-mcp

Install all necessary dependencies

npm install

Execute in an active development monitoring mode

npm run dev

Compile production assets

npm run build

Operational Flow Summary

The utility commences by querying the Brave Search API to acquire the preliminary set of relevant documents.
For every promising search hit, a headless Puppeteer instance is initialized to remotely navigate to the specified URI.
The core textual content, associated metadata, and internal hyperlink structure are then extracted from the active document.
If the configured depth parameter exceeds one, the process iterates by following discovered links and repeating the content harvesting cycle.
All aggregated, cleaned, and structured textual data is compiled and transmitted back to the consuming AI agent.

Licensing Information

MIT License

WIKIPEDIA: A headless browser constitutes a web browser application devoid of a conventional graphical user interface. These environments facilitate programmatic steering of web pages within a context mirroring established browsers, but execution occurs strictly via command-line interfaces or network protocols. Their primary utility lies in validating web assets, as they possess the capability to render and interpret HTML—including presentation specifics like layout, color schemes, typeface selection, and the execution of dynamic scripting languages like JavaScript and Ajax—functionalities often unavailable in simpler interrogation utilities.

Beginning with Chrome version 59 and Firefox version 56, native remote control interfaces were integrated, rendering previous non-standardized solutions, such as PhantomJS, largely obsolete.

== Primary Applications == The principal domains benefiting from headless browser technology include:

Automated quality assurance routines for contemporary web applications (web testing).
Programmatic generation of full-page raster images (screenshots).
Execution of rigorous testing suites for JavaScript frameworks.
Orchestrated interaction with complex web interfaces.

=== Secondary Utilities === Headless browsers also prove invaluable in the domain of automated web data acquisition (web scraping). Google itself acknowledged in 2009 that using such a mechanism aids in indexing content reliant on Ajax rendering techniques.

Conversely, these tools have been implicated in various forms of misuse:

Orchestrating Denial-of-Service attacks against network targets.
Artificially inflating digital advertisement impressions.
Automating web site interactions in unauthorized manners, such as credential stuffing attacks.

Despite these risks, a 2018 traffic analysis study indicated no observable bias toward malicious actors favoring headless environments over traditional browsers for harmful activities like DDoS operations, SQL injection exploits, or Cross-Site Scripting attacks.

== Instrumentation == Given that several major browser engines now natively support an un-headed operational mode through dedicated application programming interfaces, various software layers have been developed to abstract and unify the control of these browsers. Prominent examples include:

Selenium WebDriver – Adheres strictly to the W3C specification for WebDriver protocols.
Playwright – A versatile Node.js utility for automating interactions across Chromium, Firefox, and WebKit engines.
Puppeteer – A specialized Node.js library focusing on Chrome and Firefox automation.

=== Quality Assurance Integration === Certain software suites designed for automated testing incorporate headless browser capabilities as core components of their testing apparatuses.

Capybara utilizes headless browsing, selecting either WebKit or Headless Chrome to simulate genuine end-user interaction patterns within its testing protocols.
Jasmine defaults to Selenium but offers configuration pathways to utilize WebKit or Headless Chrome for its browser-based test runs.
Cypress, a comprehensive frontend testing framework.
QF-Test, a tool for GUI-based automated verification where headless rendering is an available option.

=== Substitutes === An alternative paradigm involves employing software packages that expose browser-like functionalities via abstract interfaces. For instance, Deno incorporates browser-compatible APIs directly into its runtime structure. For the Node.js ecosystem, jsdom serves as the most comprehensive provider of these simulated features. While most alternatives successfully manage common browser functions (HTML parsing, cookie management, XHR calls, limited JavaScript execution), they typically lack genuine DOM rendering and exhibit restricted support for handling DOM events. Consequently, they generally execute faster than fully rendered environments, albeit with diminished fidelity.

brave-web-digester-mcp-service

Author

suthio

Quick Info

Actions

Tags

@suthio/brave-deep-research-mcp: Advanced Web Content Harvesting Service

Differentiation from the Baseline Brave Search Connector

Baseline Brave Search MCP Connector:

Advanced Brave Deep Research MCP Connector (This Implementation):

Illustrative Divergence in a Sample Inquiry

Emerging Trends in City Cooling Solutions - Research Hub

Primary Document Body

Core Capabilities

Deployment Guide

Global installation via Node Package Manager

Alternatively, obtain source code

Initialization Parameters

Modify the file to input your Brave Search credential and operational settings

Environment Variables Reference

Operational Procedures

Execution via Command Line Interface

If installed globally

Execute via npx runner

Run directly from the cloned source directory

Integration with Claude Desktop Environment

Example Interaction Prompts

Tool Argument Specification

Development Workflow

Obtain the source code repository

Install all necessary dependencies

Execute in an active development monitoring mode

Compile production assets

Operational Flow Summary

Licensing Information

See Also