Mcp Doc Scraper

Scrapes documentation from web URLs and converts it into markdown format, saving the converted documentation to a specified output path. Integrates with the Model Context Protocol (MCP) for enhanced data management.

Author

Mcp Doc Scraper logo

askjohngeorge

No License

Quick Info

GitHub GitHub Stars 7
NPM Weekly Downloads 0
Tools 1
Last Updated 31/5/2025

Tags

documentation markdown scraper scrapes documentation doc scraper mcp doc

MseeP.ai Security Assessment Badge

Doc Scraper MCP Server

smithery badge

A Model Context Protocol (MCP) server that provides documentation scraping functionality. This server converts web-based documentation into markdown format using jina.ai's conversion service.

Features

  • Scrapes documentation from any web URL
  • Converts HTML documentation to markdown format
  • Saves the converted documentation to a specified output path
  • Integrates with the Model Context Protocol (MCP)

Installation

Installing via Smithery

To install Doc Scraper for Claude Desktop automatically via Smithery:

npx -y @smithery/cli install @askjohngeorge/mcp-doc-scraper --client claude
  1. Clone the repository:
git clone https://github.com/askjohngeorge/mcp-doc-scraper.git
cd mcp-doc-scraper
  1. Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows, use: venv\Scripts\activate
  1. Install the dependencies:
pip install -e .

Usage

The server can be run using Python:

python -m mcp_doc_scraper

Tool Description

The server provides a single tool:

  • Name: scrape_docs
  • Description: Scrape documentation from a URL and save as markdown
  • Input Parameters:
    • url: The URL of the documentation to scrape
    • output_path: The path where the markdown file should be saved

Project Structure

doc_scraper/
├── __init__.py
├── __main__.py
└── server.py

Dependencies

  • aiohttp
  • mcp
  • pydantic

Development

To set up the development environment:

  1. Install development dependencies:
pip install -r requirements.txt
  1. The server uses the Model Context Protocol. Make sure to familiarize yourself with MCP documentation.

License

MIT License