InternetContentFetcher-MCP

A Model Context Protocol (MCP) endpoint implementation furnishing real-time web retrieval capabilities over stdio communication. This server interfaces with an external WebSearch Crawler API service to secure search outcomes.

Overview
Acquisition
Tuning Parameters
Deployment & Interfacing
Configuring the Crawler Subsystem
Connecting with MCP Consumers
Utilization
Input Arguments
Sample Retrieval Payload
Local Verification
As a Programmatic Module
Error Resolution
Crawler Subsystem Faults
MCP Endpoint Faults
Development Pipeline
Codebase Layout
Publishing on npm Registry
Contributions
Licensing

Overview

InternetContentFetcher-MCP functions as an MCP server providing dynamic web querying power to AI assistants supporting the protocol. It empowers models such as Claude to execute immediate lookups on the global network, sourcing contemporary facts on any given subject.

The endpoint coordinates with a dedicated Crawler API component responsible for executing the actual web lookups, adhering to the established Model Context Protocol for communication with AI consumers.

Acquisition

Installation via Smithery

To provision InternetContentFetcher for Claude Desktop automatically using Smithery:

bash npx -y @smithery/cli install @mnhlt/WebSearch-MCP --client claude

Manual Provisioning

bash npm install -g websearch-mcp

Or invoke without prior installation:

bash npx websearch-mcp

Tuning Parameters

The InternetContentFetcher MCP endpoint accepts configuration via environment variables:

API_URL: Uniform Resource Locator pointing to the WebSearch Crawler API endpoint (default: http://localhost:3001)
MAX_SEARCH_RESULT: Upper bound for the quantity of retrieved items if not explicitly set in the request (default: 5)

Illustrative Examples: bash

Adjusting API target address

API_URL=https://crawler.example.com npx websearch-mcp

Setting the result quantity ceiling

MAX_SEARCH_RESULT=10 npx websearch-mcp

Specifying both configuration options

API_URL=https://crawler.example.com MAX_SEARCH_RESULT=10 npx websearch-mcp

Deployment & Interfacing

Establishing InternetContentFetcher-MCP requires setting up two primary elements: configuring the underlying crawler service that executes searches, and integrating the MCP server layer with client AI applications.

Configuring the Crawler Subsystem

The InternetContentFetcher MCP server mandates a separate crawler service for executing the actual network queries. This service can be readily initialized using Docker Compose.

Prerequisites

Activating the Crawler Subsystem

Construct a file named docker-compose.yml containing the subsequent configuration:

yaml version: '3.8'

services: crawler: image: laituanmanh/websearch-crawler:latest container_name: websearch-api restart: unless-stopped ports: - "3001:3001" environment: - NODE_ENV=production - PORT=3001 - LOG_LEVEL=info - FLARESOLVERR_URL=http://flaresolverr:8191/v1 depends_on: - flaresolverr volumes: - crawler_storage:/app/storage

flaresolverr: image: 21hsmw/flaresolverr:nodriver container_name: flaresolverr restart: unless-stopped environment: - LOG_LEVEL=info - TZ=UTC

volumes: crawler_storage:

Workaround for Mac Apple Silicon

version: '3.8'

services: crawler: image: laituanmanh/websearch-crawler:latest container_name: websearch-api platform: "linux/amd64" restart: unless-stopped ports: - "3001:3001" environment: - NODE_ENV=production - PORT=3001 - LOG_LEVEL=info - FLARESOLVERR_URL=http://flaresolverr:8191/v1 depends_on: - flaresolverr volumes: - crawler_storage:/app/storage

flaresolverr: image: 21hsmw/flaresolverr:nodriver platform: "linux/arm64" container_name: flaresolverr restart: unless-stopped environment: - LOG_LEVEL=info - TZ=UTC

volumes: crawler_storage:

Initiate the services:

bash docker-compose up -d

Confirm service operational status:

bash docker-compose ps

Query the crawler API health check endpoint:

bash curl http://localhost:3001/health

Anticipated Payload:

{ "status": "ok", "details": { "status": "ok", "flaresolverr": true, "google": true, "message": null } }

The crawler API will subsequently be reachable at http://localhost:3001.

Validating the Crawler API

Direct testing of the crawler API via curl:

bash curl -X POST http://localhost:3001/crawl \ -H "Content-Type: application/json" \ -d '{ "query": "typescript best practices", "numResults": 2, "language": "en", "filters": { "excludeDomains": ["youtube.com"], "resultType": "all" } }'

Bespoke Settings

Custom adjustments to the crawler service are managed by altering environment variables within the docker-compose.yml file:

PORT: Network port utilized by the crawler API listener (default: 3001)
LOG_LEVEL: Verbosity level for logging (valid values: debug, info, warn, error)
FLARESOLVERR_URL: Address for the FlareSolverr instance (used for mitigating Cloudflare challenges)

Interfacing with MCP Clients

Summary: MCP Configuration Snippet

Reference settings for MCP configuration across different consuming platforms:

{ "mcpServers": { "websearch": { "command": "npx", "args": [ "websearch-mcp" ], "environment": { "API_URL": "http://localhost:3001", "MAX_SEARCH_RESULT": "5" // Decrease to conserve tokens, augment for broader data acquisition } } } }

Workaround specific to Windows environments, addressing Issue

{ "mcpServers": { "websearch": { "command": "cmd", "args": [ "/c", npx", "websearch-mcp" ], "environment": { "API_URL": "http://localhost:3001", "MAX_SEARCH_RESULT": "1" } } } }

Utilization

This software package implements an MCP server utilizing stdio transit, exposing a tool named web_search with the subsequent input parameters:

Input Arguments

query (Mandatory): The search phrase to be investigated.
numResults (Optional): Desired count of results to be returned (Default: 5).
language (Optional): ISO 639-1 code for result language preference (e.g., 'en').
region (Optional): Geographic region code for scoping results (e.g., 'us').
excludeDomains (Optional): Website hostnames to omit from the final compilation.
includeDomains (Optional): Hostnames that are strictly permitted in the results.
excludeTerms (Optional): Keywords or phrases that must be filtered out.
resultType (Optional): Categorization of desired content ('all', 'news', or 'blogs').

Sample Retrieval Payload

An illustration of a successful data fetch response:

{ "query": "machine learning trends", "results": [ { "title": "Top Machine Learning Trends in 2025", "snippet": "The key machine learning trends for 2025 include multimodal AI, generative models, and quantum machine learning applications in enterprise...", "url": "https://example.com/machine-learning-trends-2025", "siteName": "AI Research Today", "byline": "Dr. Jane Smith" }, { "title": "The Evolution of Machine Learning: 2020-2025", "snippet": "Over the past five years, machine learning has evolved from primarily supervised learning approaches to more sophisticated self-supervised and reinforcement learning paradigms...", "url": "https://example.com/ml-evolution", "siteName": "Tech Insights", "byline": "John Doe" } ] }

Local Verification

To test the InternetContentFetcher MCP server locally, employ the integrated test utility:

bash npm run test-client

This action launches the MCP server alongside a minimalist command-line interface enabling input of search terms and immediate viewing of outcomes.

You can also dictate the API endpoint for the test client:

bash API_URL=https://crawler.example.com npm run test-client

As a Programmatic Module

This package can be utilized directly within other codebases:

typescript import { createMCPClient } from '@modelcontextprotocol/sdk';

// Instantiate an MCP consumer client const client = createMCPClient({ transport: { type: 'subprocess', command: 'npx websearch-mcp' } });

// Invoke a web search operation const response = await client.request({ method: 'call_tool', params: { name: 'web_search', arguments: { query: 'your search query', numResults: 5, language: 'en' } } });

console.log(response.result);

Error Resolution

Crawler Subsystem Faults

Endpoint Inaccessible: Verify that the crawler component is operational and reachable at the configured API_URL.
No Search Data: Inspect the logs of the crawler component for operational failures: bash docker-compose logs crawler
FlareSolverr Issues: If errors arise related to Cloudflare anti-bot measures, confirm FlareSolverr's health: bash docker-compose logs flaresolverr

MCP Endpoint Faults

Inclusion Errors: Ensure your local installation of the MCP SDK is current: bash npm install -g @modelcontextprotocol/sdk@latest
Linkage Problems: Confirm that the stdio communication channel is correctly defined for your specific client application.

Development Pipeline

To contribute code modifications:

Obtain a local copy of the repository
Install necessary dependencies: npm install
Compile the source code: npm run build
Execute in development mode: npm run dev

The server strictly expects a WebSearch Crawler API conforming to the specification outlined in the included swagger.json file. Ensure the API is active at the designated API_URL.

Codebase Layout

.gitignore: Defines files and directories to be ignored by Git (e.g., node_modules, dist, logs).
.npmignore: Specifies contents to be excluded from the published npm package.
package.json: Contains project metadata, scripts, and dependency listings.
src/: Directory housing the source TypeScript files.
dist/: Output directory for compiled JavaScript artifacts (generated post-build).

Publishing on npm Registry

To deploy this package to the npm registry:

Confirm you possess an active npm account and are logged in (npm login)
Increment the version number in package.json (npm version patch|minor|major)
Execute the publication command: npm publish

The .npmignore file guarantees that only essential components are packaged:

Compiled assets in dist/
Documentation files (README.md and LICENSE)
The project descriptor (package.json)

Contributions

We welcome community input! Please submit any proposed changes via a Pull Request.

Licensing

ISC

WIKIPEDIA: XMLHttpRequest (XHR) is an API in the form of a JavaScript object whose methods transmit HTTP requests from a web browser to a web server. The methods allow a browser-based application to send requests to the server after page loading is complete, and receive information back. XMLHttpRequest is a component of Ajax programming. Prior to Ajax, hyperlinks and form submissions were the primary mechanisms for interacting with the server, often replacing the current page with another one.

== History == The concept behind XMLHttpRequest was conceived in 2000 by the developers of Microsoft Outlook. The concept was then implemented within the Internet Explorer 5 browser (1999). However, the original syntax did not use the XMLHttpRequest identifier. Instead, the developers used the identifiers ActiveXObject("Msxml2.XMLHTTP") and ActiveXObject("Microsoft.XMLHTTP"). As of Internet Explorer 7 (2006), all browsers support the XMLHttpRequest identifier. The XMLHttpRequest identifier is now the de facto standard in all the major browsers, including Mozilla's Gecko layout engine (2002), Safari 1.2 (2004) and Opera 8.0 (2005).

=== Standards === The World Wide Web Consortium (W3C) published a Working Draft specification for the XMLHttpRequest object on April 5, 2006. On February 25, 2008, the W3C published the Working Draft Level 2 specification. Level 2 added methods to monitor event progress, allow cross-site requests, and handle byte streams. At the end of 2011, the Level 2 specification was absorbed into the original specification. At the end of 2012, the WHATWG took over development and maintains a living document using Web IDL.

== Usage == Generally, sending a request with XMLHttpRequest has several programming steps.

Create an XMLHttpRequest object by calling a constructor: Call the "open" method to specify the request type, identify the relevant resource, and select synchronous or asynchronous operation: For an asynchronous request, set a listener that will be notified when the request's state changes: Initiate the request by calling the "send" method: Respond to state changes in the event listener. If the server sends response data, by default it is captured in the "responseText" property. When the object stops processing the response, it changes to state 4, the "done" state. Aside from these general steps, XMLHttpRequest has many options to control how the request is sent and how the response is processed. Custom header fields can be added to the request to indicate how the server should fulfill it, and data can be uploaded to the server by providing it in the "send" call. The response can be parsed from the JSON format into a readily usable JavaScript object, or processed gradually as it arrives rather than waiting for the entire text. The request can be aborted prematurely or set to fail if not completed in a specified amount of time.

== Cross-domain requests ==

In the early development of the World Wide Web, it was found possible to brea

InternetContentFetcher-MCP

Author

mnhlt

Quick Info

Actions

Tags

InternetContentFetcher-MCP

Table of Contents

Overview

Acquisition

Installation via Smithery

Manual Provisioning

Tuning Parameters

Adjusting API target address

Setting the result quantity ceiling

Specifying both configuration options

Deployment & Interfacing

Configuring the Crawler Subsystem

Prerequisites

Activating the Crawler Subsystem

Validating the Crawler API

Bespoke Settings

Interfacing with MCP Clients

Summary: MCP Configuration Snippet

Utilization

Input Arguments

Sample Retrieval Payload

Local Verification

As a Programmatic Module

Error Resolution

Crawler Subsystem Faults

MCP Endpoint Faults

Development Pipeline

Codebase Layout

Publishing on npm Registry

Contributions

Licensing

See Also