Search & Data Extraction MCP Repositories
67 repositories in this category.
MCP-SearXNG
→A Model Context Protocol Server for [SearXNG](https://docs.searxng.org)
ncbi-mcp-server
→Comprehensive NCBI/PubMed literature search server with advanced analytics, caching, MeSH integration, related articles discovery, and batch processing for all life sciences and biomedical research.
duckduckgo-mcp-server
→This is a TypeScript-based MCP server that provides DuckDuckGo search functionality.
mcp-screenshot-website-fast
→Fast screenshot capture tool optimized for Claude Vision API. Automatically tiles full pages into 1072x1072 chunks for optimal AI processing with configurable viewports and wait strategies for dynamic content.
mcp-server
→MCP Server for DealX platform
scrapeless-mcp-server
→The Scrapeless Model Context Protocol service acts as an MCP server connector to the Google SERP API, enabling web search within the MCP ecosystem without leaving it.
arxiv-mcp-server
→Search ArXiv research papers
melrose-mcp
→Plays [Melrōse](https://melrōse.org) music expressions as MIDI
Web-Analyzer-MCP
→Extracts clean web content for RAG and provides Q&A about web pages.
mcp-claude-hackernews
→An integration that allows Claude Desktop to interact with Hacker News using the Model Context Protocol (MCP).
himalayas-mcp
→Access tens of thousands of remote job listings and company information. This public MCP server provides real-time access to Himalayas' remote jobs database.
enrichr-mcp-server
→A MCP server that provides gene set enrichment analysis using the Enrichr API
MCP-searxng
→An MCP Server to connect to searXNG instances
g-search-mcp
→A powerful MCP server for Google search that enables parallel searching with multiple keywords simultaneously.
mcp-server-webcrawl
→Advanced search and retrieval for web crawler data. Supports WARC, wget, Katana, SiteOne, and InterroBot crawlers.
mcp-read-website-fast
→Fast, token-efficient web content extraction for AI agents - converts websites to clean Markdown while preserving links. Features Mozilla Readability, smart caching, polite crawling with robots.txt support, and concurrent fetching.
mcp-server-tavily
→[vectorize-io/vectorize-mcp-server](https://github.com/vectorize-io/vectorize-mcp-server/) ☁️ 📇 - [Vectorize](https://vectorize.io) MCP server for advanced retrieval, Private Deep Research, Anything-to-Markdown file extraction and text chunking.
mcp-webresearch
→Search Google and do deep web research on any topic
GeekNews-MCP-Server
→An MCP Server that retrieves and processes news data from the GeekNews site.
NyxDocs
→Specialized MCP server for cryptocurrency project documentation management with multi-blockchain support (Ethereum, BSC, Polygon, Solana).
mcp-tavily
→[leehanchung/bing-search-mcp](https://github.com/leehanchung/bing-search-mcp) 📇 ☁️ - Web search capabilities using Microsoft Bing Search API
job-searchoor
→An MCP server for searching job listings with filters for date, keywords, remote work options, and more.
catalysishub-mcp-server
→Unofficial MCP server for searching and retrieving scientific data from the Catalysis Hub database, providing access to computational catalysis research and surface reaction data.
kagimcp
→[kehvinbehvin/json-mcp-filter](https://github.com/kehvinbehvin/json-mcp-filter) ️🏠 📇 – Stop bloating your LLM context. Query & Extract only what you need from your JSON files.
gxtract
→GXtract is a MCP server designed to integrate with VS Code and other compatible editors (documentation: [sascharo.github.io/gxtract](https://sascharo.github.io/gxtract)). It provides a suite of tools for interacting with the GroundX platform, enabling you to leverage its powerful document understanding capabilities directly within your development environment.
search1api-mcp
→Search via search1api (requires paid API key)
content-core
→Extract content from URLs, documents, videos, and audio files using intelligent auto-engine selection. Supports web pages, PDFs, Word docs, YouTube transcripts, and more with structured JSON responses.
fetcher-mcp
→MCP server for fetching web page content using Playwright headless browser, supporting Javascript rendering and intelligent content extraction, and outputting Markdown or HTML format.
baseline-mcp-server
→MCP server that searches Baseline status using Web Platform API
mcp-paperswithcode
→🐍 ☁️ MCP to search through PapersWithCode API
