logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

mcp-remote-web-retrieval

Facilitates sophisticated Google web information retrieval and rendering of retrieved documents, incorporating anti-bot mechanisms. It features request throttling, result caching, concurrent browser pool management, and evasion of automated traffic detection.

Author

mcp-remote-web-retrieval logo

Claw256

MIT License

Quick Info

GitHub GitHub Stars 5
NPM Weekly Downloads 0
Tools 1
Last Updated 2026-02-19

Tags

markdownapissearchweb searchsearch providesmcp web

Remote Web Content Access Module

This MCP server component furnishes Google search capabilities alongside secure web page viewing, employing advanced strategies to circumvent automated detection systems.

Core Capabilities

  • Custom Google Search Integration: Supports advanced query refinement directives.
  • Web Document Rendering: Ability to ingest and transform retrieved web content, including Markdown format conversion.
  • Performance Controls: Implements strict rate governance and data caching layers.
  • Headless Environment Management: Utilizes a pool of browser instances for parallel operations.
  • Evasion Techniques: Employs rebrowser-puppeteer countermeasures against bot identification.

Prerequisites for Deployment

  • Runtime environment must be Bun, version 1.0 or newer.
  • Valid Google authentication credentials (API Key and Custom Search Engine Identifier) are mandatory.

Setup Procedure

# Install necessary dependencies
bun install

# Compile TypeScript source files
bun run build

Configuration Guidelines

For accessing sites requiring prior user login:

  1. Install the required Chrome extension: Get cookies.txt LOCALLY.
  2. Navigate to the target authenticated domains and successfully complete the login process.
  3. Utilize the extension to export the session cookies into a JSON file structure.
  4. Securely store this exported file.
  5. Set the BROWSER_COOKIES_PATH environment variable to the absolute file system location of the cookies file.

MCP Server Endpoint Definition

Integrate the following configuration block into your primary MCP settings file:

  • For Cline users: %APPDATA%\Code\User\globalStorage\rooveterinaryinc.roo-cline\settings\cline_mcp_settings.json
  • For Claude Desktop users:
  • Linux/macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json
{
  "mcpServers": {
    "web-search": {
      "command": "bun",
      "args": [
        "run",
        "/ABSOLUTE/PATH/TO/web_search_mcp/dist/index.js"
      ],
      "env": {
        "GOOGLE_API_KEY": "your_api_key",
        "GOOGLE_SEARCH_ENGINE_ID": "your_search_engine_id",
        "MAX_CONCURRENT_BROWSERS": "3",
        "BROWSER_TIMEOUT": "30000",
        "RATE_LIMIT_WINDOW": "60000",
        "RATE_LIMIT_MAX_REQUESTS": "60",
        "SEARCH_CACHE_TTL": "3600",
        "VIEW_URL_CACHE_TTL": "7200",
        "MAX_CACHE_ITEMS": "1000",
        "BROWSER_POOL_MIN": "1",
        "BROWSER_POOL_MAX": "5",
        "BROWSER_POOL_IDLE_TIMEOUT": "30000",
        "REBROWSER_PATCHES_RUNTIME_FIX_MODE": "addBinding",
        "REBROWSER_PATCHES_SOURCE_URL": "jquery.min.js",
        "REBROWSER_PATCHES_UTILITY_WORLD_NAME": "util",
        "REBROWSER_PATCHES_DEBUG": "0",
        "BROWSER_COOKIES_PATH": "C:\\path\\to\\cookies.json",
        "LOG_LEVEL": "info",
        "NO_COLOR": "0",
        "BUN_FORCE_COLOR": "1",
        "FORCE_COLOR": "1"
      }
    }
  }
}

Substitute /ABSOLUTE/PATH/TO/web_search_mcp with the server directory's full path.

Operational Logging Configuration

Logging behavior is governed by the following environment variables:

  • LOG_LEVEL: Determines verbosity (options: error, warn, info, debug). Default is info.
  • NO_COLOR: Disable colorized output by setting to "1".
  • BUN_FORCE_COLOR: Controls color output specifically within the Bun runtime (set to "0" to suppress).
  • FORCE_COLOR: Global toggle for colorized output (set to "0" to suppress).

Evasion of Automated Detection

This service employs rebrowser-puppeteer modifications to thwart common bot identification markers:

  1. Runtime Verification Prevention:
  2. Leverages the addBinding mechanism to counteract Runtime.Enable checks.
  3. Functional across web workers and embedded iframes.
  4. Preserves necessary access to the primary execution context.

  5. Source URL Obfuscation:

  6. Modifies Puppeteer's internal sourceURL attribute to mimic genuine script loading patterns.
  7. Assists in masking the presence of automated control.

  8. Standardized Utility Context:

  9. Assigns a generic name to the utility execution world.
  10. Avoids detection triggered by known automation world naming conventions.

  11. Browser Initialization Adjustments:

  12. Flags indicating automation are suppressed.
  13. Optimized Chrome command-line arguments are utilized.
  14. Viewport dimensions and window geometry are configured realistically.

Integration with Claude Desktop

  1. Ensure your Claude Desktop application is the most current iteration.
  2. Locate and open the configuration file:
  3. macOS/Linux: ~/Library/Application Support/Claude/claude_desktop_config.json
  4. Windows: %APPDATA%\Claude\claude_desktop_config.json

  5. Incorporate the server definition as detailed in the Configuration section above.

  6. Relaunch Claude Desktop.

  7. Verify successful tool loading by observing the presence of the hammer icon .

Exposed Tools

1. Search Utility

{
  name: "search",
  params: {
    query: string;
    trustedDomains?: string[];
    excludedDomains?: string[];
    resultCount?: number;
    safeSearch?: boolean;
    dateRestrict?: string;
  }
}

2. URL Content Viewer

{
  name: "view_url",
  params: {
    url: string;
    includeImages?: boolean;
    includeVideos?: boolean;
    preserveLinks?: boolean;
    formatCode?: boolean;
  }
}

Diagnostics and Issue Resolution

Challenges with Claude Desktop Connectivity

  1. Review system logs for errors: ```bash # Unix-like systems tail -n 20 -f ~/Library/Logs/Claude/mcp*.log

# Windows CMD type %APPDATA%\Claude\Logs\mcp*.log ```

  1. Common Faults:
    • Server unreachable: Verify configuration syntax and path accuracy.
    • Tool execution failures: Inspect server logs and attempt application restart.
    • Path discrepancies: Confirm the exclusive use of fully qualified (absolute) paths.

Consult the MCP debugging guide for in-depth troubleshooting assistance.

Development Lifecycle

# Start development with automatic file watching
bun --watch run dev

# Execute automated tests
bun run test

# Run code quality checks
bun run lint

Critical Observations

  1. Bot Resistance:
  2. The integrated evasion features address the majority of standard detection vectors.
  3. For maximum security, consider augmenting with dedicated proxy rotation and specialized user-agent strings.
  4. Certain advanced filtering methods employed by target sites may still succeed in identification.

  5. Resource Management:

  6. Browser processes are maintained in a reusable pool.
  7. Unused browser sessions are automatically terminated upon timeout.
  8. Safeguards are in place to prevent excessive system resource exhaustion.

Licensing

Licensed under MIT.

WIKIPEDIA: XMLHttpRequest (XHR) functions as an API encapsulated within a JavaScript object, designed to dispatch HTTP requests from a running web browser to a remote web server. The methods provided enable browser-based applications to asynchronously transmit requests post-page load and subsequently receive data payloads. XMLHttpRequest is foundational to the methodology known as Ajax. Before Ajax's ascendancy, server interaction primarily relied on traditional hyperlink navigation and form submissions, actions that typically necessitated a complete page refresh.

== Genesis == The underlying concept for XMLHttpRequest originated around the year 2000, conceived by developers working on Microsoft Outlook. This concept was first materialized in the Internet Explorer 5 browser release (1999). Notably, the initial implementation did not utilize the standard XMLHttpRequest identifier; instead, developers invoked COM objects via ActiveXObject("Msxml2.XMLHTTP") or ActiveXObject("Microsoft.XMLHTTP"). By the time Internet Explorer 7 arrived in 2006, universal browser support for the standardized XMLHttpRequest identifier was established.

The XMLHttpRequest identifier has since become the universally accepted standard across all major browser engines, including Mozilla’s Gecko (adopted 2002), Safari 1.2 (2004), and Opera 8.0 (2005).

=== Standardization Efforts === The World Wide Web Consortium (W3C) formally published an initial Working Draft specification for the XMLHttpRequest object on April 5, 2006. A subsequent Level 2 specification, introducing features like progress monitoring, cross-site request enablement, and byte stream handling, was released by the W3C on February 25, 2008. By the close of 2011, the Level 2 enhancements were merged back into the primary specification document. In late 2012, stewardship transitioned to the WHATWG, which now maintains the document as a continuously updated specification using Web IDL notation.

== Operational Workflow == Executing a server request using XMLHttpRequest typically involves a sequence of programming actions.

  1. Instantiation: Create an instance of the XMLHttpRequest object using its constructor:
  2. Configuration: Invoke the "open" method to define the HTTP verb, specify the target resource Uniform Resource Identifier (URI), and select between synchronous or asynchronous execution mode:
  3. Asynchronous Listener Setup: If operating asynchronously, assign a callback function to handle state change notifications:
  4. Transmission: Begin the request lifecycle by calling the "send" method:
  5. Response Handling: Monitor the state changes within the event handler. Upon receiving server data, it is usually accessible via the responseText property. When the object transitions to state 4, indicating completion, the full response is ready. Beyond these fundamental steps, XMLHttpRequest offers granular control over request transmission and response parsing. Custom headers can be applied to instruct the server on fulfillment methods, and data payloads can be transmitted within the send argument. Responses can be immediately parsed from JSON into native JavaScript structures, or streamed gradually instead of waiting for the complete block. Requests can also be manually terminated or configured with a timeout to enforce failure if processing exceeds a set duration.

== Requests Across Domain Boundaries ==

In the nascent stages of the World Wide Web's evolution, it became apparent that it was technically feasible to compromise security by brea

See Also

`