Document Processing MCP Repositories
326 repositories in this category.
docs
→
Provides a starter kit for creating and maintaining documentation, including guide pages, navigation, customizations, and API references. Supports local previews and automatic deployment of documentation updates via integration with a GitHub app.
AverbePorto-MCP
→
Integrates with AverbePorto to manage authentication and document submission for cargo insurance endorsements. Provides a secure API for automated document handling and protocol consultations.
unstructured-mcp
→
Enable extraction and utilization of content from various unstructured document formats, supporting seamless storage and retrieval via AWS S3. Process documents directly in applications to enhance data extraction capabilities for LLMs.
pptx-xlsx-mcp
→
Interact with Microsoft Office applications like PowerPoint and Excel to create, modify, and analyze presentations and spreadsheets using natural language commands. Automate complex tasks and data manipulations efficiently within the Office environment.
mcp-outline
→
Enables interaction with Outline's document management services through natural language commands for searching, creating, and managing documents. Facilitates tasks like reading document content and managing comments within a structured collection.
whiskerrag_toolkit
→
Provides retrieval-augmented generation capabilities for applications, allowing integration of various data sources with advanced processing methods. Features a toolkit with type definitions and methods for effective RAG implementation.
MCP-llms-txt
→
Integrate documentation directly into conversations by utilizing MCP resources for chat applications. This server enhances interactions by providing relevant documentation content as part of the dialogue.
ClaudeHopper
→
Interact with construction documents, drawings, and specifications. Analyze technical details and retrieve specific information through advanced retrieval-augmented generation and hybrid search.
puremd-mcp
→
Access web content in markdown format by prefixing URLs with `pure.md/`, facilitating seamless retrieval of web pages while avoiding bot detection. It converts various formats like HTML and PDFs into markdown and globally caches responses for efficiency.
sui-mcp-server
→
Enables AI agents to retrieve documents from a vector database using Retrieval-Augmented Generation (RAG) techniques. Integrates with GitHub to process Move files and incorporates a language model for generating responses based on retrieved information.
handwriting-ocr-mcp-server
→
Integrate applications with the Handwriting OCR service to process images and PDF documents for text extraction. Upload documents, check processing status, and retrieve OCR results in Markdown format.
MCP-server-readability-python
→
Extracts and transforms webpage content into clean, LLM-optimized Markdown, removing ads and non-essential elements for improved readability and processing by language models.
word-mcp-server
→
Facilitates the creation and editing of Microsoft Word documents via a straightforward API. Supports adding formatted text, images, and tables, enabling document generation and modification through natural language commands with LLM integration.
mcp-server-docy
→
Provides real-time access to technical documentation from various sources, enabling accurate coding assistance. Supports dynamic updates to documentation sources and employs caching to reduce latency while ensuring fresh content.
seo-inspector-mcp
→
Analyzes HTML files and web pages to identify SEO issues and validate structured data schemas. Provides actionable recommendations for improving SEO quality directly through integrated tools without the need for a browser extension.
open-docs-mcp
→
Crawl, index, and manage documentation while enabling full-text search across various document formats for efficient information retrieval. Integrates with AI to enhance document access and management capabilities.
klavis
→
Generates visually appealing web reports based on simple search queries, integrating live web search results and storing reports in a database for easy access. Utilizes AI to synthesize information into interactive HTML formats.
textin-mcp
→
Extract text from images, PDFs, and Word documents while performing OCR and document conversion tasks. Convert documents to Markdown format, and retrieve key information from files intelligently.
laas-rag-mcp
→
Upload documents in PDF or CSV formats and perform natural language queries to retrieve relevant information. It features document segmentation and embedding storage using a Chroma vector store for efficient retrieval.
context7
→
Fetches up-to-date, version-specific code documentation and examples from source libraries to enhance prompts, reducing reliance on outdated code and inaccurate APIs. Integrates real-time library documentation into LLM context to improve coding accuracy and productivity.
obsidian_fetch
→
Retrieve and load notes efficiently from Obsidian vaults, enabling enhanced interactions with language models by cleaning link queries and displaying backlinks to opened files. Streamlined for local GPU setups to improve note retrieval speed and efficiency.
servers
→
Integrates with Google Drive to provide functionality for listing, reading, and searching files. It supports various file formats and exports Google Workspace files to applicable formats for easier access.
notion-readonly-mcp-server
→
Provides read-only access to Notion content, enabling retrieval of pages, blocks, databases, comments, and properties with optimized performance. Focuses on minimizing API calls and supports parallel processing for efficient data acquisition.
wiki_mcp_server
→
Manage Confluence wiki pages by creating, updating, deleting, and searching them through a unified interface. Automatically selects the relevant knowledge base based on user queries to enhance content management efficiency.
mindmap-mcp-server
→
Converts Markdown content into interactive mindmaps, generating HTML mindmaps or saving them as files for easy access and sharing. Enhances project planning and brainstorming through visual representations of ideas.
MRConfluenceLinker-mcp-server
→
Fetch and analyze GitLab merge requests, and store the analysis results in Confluence documentation to enhance documentation workflows.
mcp-data-extractor
→
Extracts embedded data such as i18n translations and configurations from TypeScript and JavaScript source code, converting them into structured JSON files while preserving the hierarchical structure and template variables.
mcp-webdav-server
→
Enable natural language interaction with WebDAV file systems to perform CRUD operations on files and directories through a secure and configurable MCP server. Supports connections with optional authentication and efficient management of file operations via multiple transport methods.
mcp-jinaai-reader
→
Integrates Jina.ai's Reader API for efficient web content extraction, enabling analysis and processing of documentation and web content.
jsondiffpatch
→
Diffs and patches JavaScript objects and arrays, enabling change tracking and state reversion through a simple API.
