URL-to-Markdown-Processor-MCP
A utility service for obtaining the textual content from any given Uniform Resource Locator (URL) and transforming it into clean, correctly structured Markdown. It features comprehensive failure management and is engineered for optimal interoperability with MCP clients.
Author

yutakobayashidev
Quick Info
Actions
Tags
WebforAI Content Acquisition Module - MCP Endpoint
This is a server implementation based on Cloudflare Workers, adhering to the Model Context Protocol (MCP), engineered to strip down web pages to raw text using the capabilities provided by WebforAI.
🌟 Introduction to WebforAI Capabilities
WebforAI is a sophisticated utility suite designed to render web documents consumable by artificial intelligence constructs. Its functions include:
- Rendering HTML into pristine, hierarchical Markdown format
- Identifying and isolating key content segments from online documents
- Interpreting elements like tabular data, hyperlinks, and embedded visuals intelligently
- Preparing sourced web material for streamlined ingestion by AI systems
This particular MCP server harnesses WebforAI's core power to fetch textual data from any specified URL, thereby simplifying the pipeline for injecting web content into AI workflows via the Model Context Protocol.
📋 Core Functionalities
- Simplified Interface: Execute content extraction from any web destination via a single programmatic invocation.
- Pristine Output: Yields Markdown output that is meticulously formatted, devoid of residual HTML markup.
- Resilience: Built-in, robust mechanisms for handling and reporting request failures.
- Serverless Infrastructure: Deployed on Cloudflare Workers for global, scale-on-demand operation.
- Protocol Compliance: Fully interoperable with various MCP consumers, such as Claude Desktop and Cloudflare's AI Playground.
🚀 Deployment and Setup
Provisioning on Cloudflare Workers
Deploy this service instantly to your Cloudflare Workers environment:
This action will establish your MCP service endpoint, typically accessible at: webforai-mcp-server.<your-account>.workers.dev/sse
Local Development Environment
-
Obtain a copy of the source code repository: bash git clone https://github.com/yutakobayashidev/webforai-mcp-server.git cd webforai-mcp-server
-
Install required package dependencies: bash pnpm install
-
Initiate the local development server: bash pnpm dev
-
The endpoint will be reachable locally, usually at
http://localhost:8787
🔧 Invoking the Text Acquisition Utility
The extractWebPageText function accepts a target URL parameter and returns the retrieved document body formatted as Markdown:
{ "url": "https://example.com/page" }
The resulting payload will contain the extracted text in Markdown format, characterized by: - Hyperlinks rendered as static text. - Tables presented as plain text structures. - Images suppressed from the output.
🔌 Establishing Connectivity with MCP Clients
Integration with Cloudflare AI Playground
- Navigate to the Cloudflare AI Playground.
- Input the address of your deployed MCP endpoint (
webforai-mcp-server.<your-account>.workers.dev/sse). - The text extraction capability will become immediately available for use within the playground interface.
Connecting via Claude Desktop
To integrate this extraction utility with your local Claude Desktop application:
- Adhere to the instructions outlined in Anthropic's Quickstart Guide.
- Within Claude Desktop's configuration settings, navigate to Settings > Developer > Edit Config.
- Update your configuration block with the following structure:
{ "mcpServers": { "webforaiExtractor": { "command": "npx", "args": [ "mcp-remote", "http://localhost:8787/sse" // Substitute with your deployed worker URL if needed ] } } }
- Relaunch Claude; the new text extraction utility should now appear in your available tools.
📚 Further Documentation
- WebforAI Comprehensive Reference: WebforAI Documentation
- Model Context Protocol Specification: Model Context Protocol
- Cloudflare Serverless Compute: Cloudflare Workers
- Cloudflare AI Services: Cloudflare AI
📄 Licensing
This project is distributed under the MIT License.
