OmniFetch Gateway MCP Service

This deployment functions as an intelligent information retrieval conduit conforming to the Model Context Protocol (MCP). It furnishes AI agents with robust searching capabilities and sophisticated web content analysis, specifically optimized for consumption by Large Language Models (LLMs). Leveraging concurrent, multi-engine querying and intelligent content curation, this server accelerates the process of converting raw internet data into maximally digestible formats for artificial intelligence processing.

Core Capabilities

🌐 Aggregated Search Engine Interface: Native support for diverse search backends, including DuckDuckGo and Google.
🧠 LLM-Centric Extraction: Advanced parsing algorithms that intelligently discard boilerplate/noise and isolate high-signal content.
✅ Value Focus: Automated identification and retention of primary narrative and evidentiary material.
🔗 Verifiability Output: Generates diverse serialization formats, intrinsically linking extracted data back to its origin source.
⚡ Performance Architecture: Built upon a high-throughput, non-blocking asynchronous framework leveraging FastMCP principles.

Deployment Instructions

Method A: Standard Environment Setup

Prerequisites Check:
Python version requirement: Minimum 3.9.
Strong recommendation for utilizing isolated virtual environments.
Source Retrieval: bash git clone https://github.com/yourusername/crawl4ai-mcp-server.git cd crawl4ai-mcp-server
Environment Initialization: bash python -m venv fetch_env source fetch_env/bin/activate # For Unix-like systems

or

.\fetch_env\Scripts\activate # For Windows PowerShell/CMD

Dependency Installation: bash pip install -r requirements.txt
Browser Component Installation (for advanced rendering): bash playwright install

Method B: Integrated Deployment via Smithery (for Claude Desktop)

Utilize the Smithery utility to seamlessly register the OmniFetch Gateway service directly into your local Claude Extension Hub:

bash npx -y @smithery/cli install @weidwonder/crawl4ai-mcp-server --client claude

Operational Interface

The server exposes the following primary functional modules:

Module: network_search

This utility provides comprehensive web querying across integrated search providers:

DuckDuckGo (Default): Operational without requiring external API credentials; processes AbstractText, Search Snippets, and Related Concepts.
Google Search: Requires prior configuration of API credentials for utilization; offers superior result precision in some domains.
Unified Mode: Capability to poll all configured search engines concurrently for maximal result breadth.

Parameters: - search_term: The textual query string. - result_count: Maximum number of indexed results to fetch (Default: 10). - provider: Selection for the search indexer. - "duckduckgo": Default, no API needed. - "google": Requires configured credentials. - "all": Executes queries across all active providers.

Invocation Examples: python

Standard DuckDuckGo execution

{ "search_term": "quantum computing theory", "result_count": 5 }

Parallel execution across all available indices

{ "search_term": "quantum computing theory", "result_count": 5, "provider": "all" }

Module: content_ingest

This specialized tool performs LLM-oriented semantic parsing on fetched URLs, transforming HTML into structured, context-rich text:

source_attribution_markdown: Default output. Markdown format enriched with inline source references for lineage tracking.
lean_context_markdown: Highly compressed Markdown, scrubbed of non-essential prose for maximum token efficiency.
base_markdown: Simple conversion from HTML to Markdown structure.
reference_extract: Isolates and presents only citation and bibliography sections.
lean_html: The raw HTML equivalent of the lean_context_markdown output.
standard_markdown: The default Markdown serialization.

Invocation Example: python { "target_uri": "https://example.com/deep_dive", "serialization_format": "source_attribution_markdown" }

Configuration Note: For Google Search enablement, credentials must be provisioned in config.json:

{ "google_credentials": { "api_key": "your-g-api-key", "search_engine_id": "your-cse-id" } }

LLM Context Optimization Strategies

The gateway employs systematic processing layers designed to enhance data suitability for neural network comprehension:

Semantic Chunking: Automated differentiation and preservation of main article body versus peripheral elements.
Noise Suppression: Aggressive filtering of navigational aids, advertisements, site footers, and other non-substantive content.
Evidential Integrity: Mandatory inclusion of source URLs within the output stream to facilitate fact-checking.
Minimalist Filtering: Removal of excessively short or context-free text fragments (minimum length threshold of 10 tokens).
Output Standardization: Prioritizing source_attribution_markdown to ensure high context fidelity for subsequent AI reasoning.

Project Structure Outline

fetch_gateway_root/ ├── service_modules/ │ ├── main_entry.py # Primary server initialization and routing │ └── query_engine.py # Logic for search orchestration ├── configuration_defaults.json # Template for runtime parameters ├── metadata.toml # Project dependency and build info ├── dependency_list.txt # List of required external libraries └── MANUAL.md # Comprehensive documentation

Configuration Management

Establish the active configuration file: bash cp configuration_defaults.json config.json
Integrate Google service credentials into config.json:

{ "google_credentials": { "api_key": "your-google-api-key-here", "search_engine_id": "your-google-cse-id-here" } }

Chronology of Releases

2025.02.08: Integrated multi-provider search support (DuckDuckGo primary, Google secondary).
2025.02.07: Architectural refactor to utilize FastMCP paradigm; improved dependency resolution.
2025.02.07: Refined content exclusion parameters, optimizing token density while guaranteeing URL traceability.

Licensing

Distributed under the MIT License.

Collaborations

Contributions via Issues and Pull Requests are highly encouraged.

Personnel

Steward: weidwonder
Development: Claude Sonnet 3.5
- Note: 100% of source code generated by Claude. Estimated consumption: $9 ($2 for initial coding, $7 for iterative correction/debugging). Development duration: 3 hours (0.5h coding, 0.5h setup, 2.0h iterative refinement).

Acknowledgment

Gratitude extended to all contributors.

Special mention to: - The original Crawl4ai repository for foundational concepts in web data extraction methodology.

WIKIPEDIA CONTEXT: Business management tools encompass the totality of systems, procedures, analytical frameworks, and operational methodologies employed by enterprises to maintain relevance in dynamic markets, secure a competitive standing, and enhance overall organizational output. These tools span departmental functions, including planning, process automation, record keeping, personnel administration, and strategic control mechanisms. Modern business applications have undergone significant technological evolution, necessitating a strategic, adaptive approach to tool selection rather than mere adoption of the newest solution to combat cost pressures and better align product delivery with evolving customer demands.

omni-fetch-gateway-mcp

Author

weidwonder

Quick Info

Actions

Tags