mcp-web-retriever
Retrieve external web assets via various HTTP verbs, managing request metadata like headers and payloads. Offers configurable request duration limits and redirection policies for robust data acquisition.
Author

mcollina
Quick Info
Actions
Tags
MCP Web Content Retrieval Utility
This server implements a mechanism for fetching remote resources leveraging the high-performance Node.js networking library, undici.
Core Capabilities
- Execute network retrievals targeting any accessible URI using standard HTTP methodologies.
- Comprehensive support for injecting custom request headers and transmitting request payloads.
- Output transformation capabilities supporting raw text, structured JSON, binary data streams, or isolated HTML segments.
- Robust error management and handling of network anomalies.
- Fine-grained control over connection timeout thresholds and automatic redirection enforcement.
Exposed MCP Functions
This package exposes the following operational utilities:
fetch-url
Acquires the content residing at a specified Uniform Resource Locator (URL).
Parameters:
- url (string, mandatory): The destination network address for retrieval.
- method (string, optional): The desired HTTP operational verb (defaults to "GET").
- headers (object, optional): A dictionary of HTTP header key-value pairs to include.
- body (string, optional): The data payload to transmit, primarily for methods like POST or PUT.
- timeout (number, optional): Maximum allowable milliseconds before the operation is terminated.
- responseType (string, optional): Defines the desired parsing strategy for the returned data: ("text", "json", "binary", "html-fragment").
- fragmentSelector (string, optional): A CSS selector string used exclusively when responseType is set to "html-fragment" to pinpoint required elements.
- followRedirects (boolean, optional): Boolean flag dictating whether subsequent location redirects should be honored (default: true).
extract-html-fragment
Isolates and retrieves specific portions of an HTML document based on CSS syntax, with an option to target content near a named anchor point.
Parameters:
- url (string, mandatory): The source webpage address.
- selector (string, mandatory): The precise CSS path defining the desired segment(s).
- anchorId (string, optional): An optional identifier used to narrow the search scope to a particular section within the document.
- method (string, optional): HTTP verb utilized for the initial retrieval (defaults to "GET").
- headers (object, optional): Custom HTTP headers to accompany the request.
- body (string, optional): Data payload for non-GET operations.
- timeout (number, optional): Time limit in milliseconds for the connection.
- followRedirects (boolean, optional): Determines adherence to redirection instructions (default: true).
check-status
Verifies the accessibility and responsiveness of a URI without necessitating the download of the complete resource payload.
Parameters:
- url (string, mandatory): The network endpoint to be validated.
- timeout (number, optional): The duration limit for the connection attempt in milliseconds.
Integration with Claude for Desktop
To enable this utility within your Claude for Desktop environment, incorporate the following configuration block into your claude_desktop_config.json file:
{ "mcpServers": { "node-fetch": { "command": "node", "args": ["dist/index.js"] } } }
Licensing Information
This software is distributed under the MIT License.
WIKIPEDIA: XMLHttpRequest (XHR) is an Application Programming Interface, structured as a JavaScript object, enabling the transmission of Hypertext Transfer Protocol (HTTP) queries from a client-side web browser environment toward a designated web server. The methods provided allow an application running within the browser context to initiate server communications subsequent to page rendering, subsequently receiving resultant data. XMLHttpRequest forms a crucial constituent of the Ajax programming paradigm. Preceding the advent of Ajax, navigational links and form submissions constituted the principal means of effecting server interaction, frequently resulting in the complete replacement of the currently viewed page.
== Historical Context ==
The foundational concept underpinning XMLHttpRequest was conceptualized in the year 2000 by the engineering team responsible for Microsoft Outlook development. This concept was subsequently realized and integrated into the Internet Explorer version 5 browser (released in 1999). Nevertheless, the initial syntax did not employ the standardized XMLHttpRequest designator. Instead, developers utilized the object instantiations ActiveXObject("Msxml2.XMLHTTP") and ActiveXObject("Microsoft.XMLHTTP"). As of Internet Explorer version 7 (2006), universal browser adoption of the XMLHttpRequest identifier became the norm.
The XMLHttpRequest identifier has since evolved into the established convention across all major web rendering engines, encompassing Mozilla's Gecko engine (2002), Safari version 1.2 (2004), and Opera version 8.0 (2005).
=== Standardization Efforts === The World Wide Web Consortium (W3C) published an initial Working Draft specification detailing the XMLHttpRequest object on April 5, 2006. Subsequently, on February 25, 2008, the W3C released the Level 2 specification. This Level 2 revision introduced capabilities to monitor data transfer progress, facilitate requests across different security origins (cross-site), and manage raw byte streams. By the conclusion of 2011, the features defined in the Level 2 specification were fully integrated back into the primary specification document. Towards the end of 2012, responsibility for ongoing evolution was transferred to the WHATWG, which now maintains the active documentation using the Web Interface Definition Language (Web IDL).
== Operational Procedures == In general terms, enacting a network request utilizing XMLHttpRequest involves several distinct programming stages.
- Instantiate an XMLHttpRequest object via its constructor method:
- Invoke the "open" function to define the request protocol type, pinpoint the requisite resource location, and choose between synchronous or asynchronous execution:
- For asynchronous operations, attach a callback handler mechanism designed to trigger upon state changes of the request:
- Commence the transmission of the request by calling the "send" method:
- Process state changes within the registered event listener. If the server successfully returns payload data, it is typically stored within the "responseText" attribute by default. Once the object finishes processing the response cycle, its state transitions to 4, signifying the "done" status. Beyond these fundamental maneuvers, XMLHttpRequest provides numerous configurable levers to govern request transmission and response interpretation. Specialized header fields can be prepended to the outgoing query to guide server behavior, and data intended for upload can be furnished directly within the "send" invocation argument. The incoming response data can be systematically parsed from JSON format into a readily usable JavaScript object structure, or alternatively processed incrementally as data segments arrive rather than awaiting the full text buffer. Furthermore, the entire request sequence can be terminated prematurely or configured to automatically fail if completion is not achieved within a predetermined temporal window.
