logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

scrapeless-mcp-hub

A centralized protocol server designed to fetch and synthesize real-time information from Google's ecosystem and dynamic web pages, thereby enriching AI agent reasoning and contextual awareness.

Author

scrapeless-mcp-hub logo

scrapeless-ai

MIT License

Quick Info

GitHub GitHub Stars 53
NPM Weekly Downloads 0
Tools 1
Last Updated 2026-02-19

Tags

apisgoogleaiscrapeless aiai scrapelessrequests scrapeless

Scrapeless Model Context Protocol (MCP) Gateway

Welcome to the official Scrapeless MCP Gateway—a robust orchestration layer facilitating LLMs, AI agents, and intelligent applications for dynamic, contemporaneous interaction with the internet.

Adhering to the open MCP specification, the Scrapeless Gateway seamlessly bridges models such as ChatGPT, Claude, and development environments like Cursor and Windsurf with extensive external capabilities, including:

  • Native integration with Google services (Search, Trends)
  • Automated browser execution for complex on-page navigation and manipulation
  • High-fidelity capture of content from JavaScript-heavy sites—outputting as raw HTML, Markdown, or visual screenshots

This server furnishes the necessary volatile context and live data required for advanced AI research assistants, coding copilots, or autonomous web operatives—all while employing evasion techniques to minimize service disruption (anti-blocking).

Operational Use Cases

  1. Advanced Web Interaction via Claude utilizing Scrapeless Browser

Claude can execute multi-step operations—such as navigating, scrolling, and content extraction—through natural language commands, viewing the interaction results live via live sessions.

  1. Bypassing Protective Measures (e.g., Cloudflare) for Content Acquisition

Leveraging the Scrapeless MCP Browser module, target pages protected by security measures are automatically accessed; upon completion, the requisite page content is retrieved and delivered formatted as Markdown.

  1. Extracting Client-Side Rendered Content and Persisting to Disk

Utilizing the Scrapeless MCP Universal API, content that relies on JavaScript rendering is scraped, converted into Markdown format, and then written directly to a local file designated as text.md.

  1. Automated Search Engine Results Page (SERP) Harvesting

Execute a query for the term “web scraping” via Google Search using the Scrapeless MCP Server, gather the top 10 result snippets (including URLs, titles, and summaries), and serialize this data into a file named serp.text.

Here are supplementary examples illustrating potential interactions:

Example Scenario
Initiate a broad query using Google Search via Scrapeless.
Determine the recent search interest trajectory for the term "AI" over the preceding twelve months.
Command a browser session to load chatgpt.com, execute an internal search for "What's the weather like today?", and synthesize the findings.
Obtain the complete HTML structure of the scrapeless.com resource.
Retrieve the cleaned Markdown representation of the scrapeless.com webpage.
Generate high-resolution visual captures (.png) of the scrapeless.com interface.

Configuration Procedure

  1. Acquire a Scrapeless Credential

  2. Access the Scrapeless Portal (Log in for registration—a trial period is active)

  3. Navigate to "Setting" (Sidebar) → select "API Key Management" → initiate "Create API Key". Finally, select the newly generated key to copy the token.

  4. Initialize the MCP Client Environment

Scrapeless MCP Server accommodates both Standard I/O (Stdio) and Streamable Hypertext Transfer Protocol (HTTP) connection methodologies.

🖥️ Stdio (Local Process Execution)

JSON { "mcpServers": { "Scrapeless MCP Server": { "command": "npx", "args": ["-y", "scrapeless-mcp-server"], "env": { "SCRAPELESS_KEY": "YOUR_SCRAPELESS_KEY" } } } }

🌐 Streamable HTTP (Remote API Mode)

JSON { "mcpServers": { "Scrapeless MCP Server": { "type": "streamable-http", "url": "https://api.scrapeless.com/mcp", "headers": { "x-api-token": "YOUR_SCRAPELESS_KEY" }, "disabled": false, "alwaysAllow": [] } } }

Extended Session Customization

Browser session characteristics can be finely tuned using supplementary directives, provided either as environment variables (for Stdio) or specific HTTP request headers (for Streamable HTTP):

Stdio Configuration (Environment Variable) Streamable HTTP (Header Field) Purpose of Setting
BROWSER_PROFILE_ID x-browser-profile-id Designates a stored browser persona for stateful session continuity.
BROWSER_PROFILE_PERSIST x-browser-profile-persist Activates the saving of session artifacts like cookies and local storage across invocations.
BROWSER_SESSION_TTL x-browser-session-ttl Dictates the maximum permissible idle duration (in seconds) before an active session is automatically terminated.

Integration Guide: Claude Desktop Application

  1. Launch the Claude Desktop interface.
  2. Navigate the settings path: SettingsToolsMCP Servers.
  3. Initiate the addition process by clicking "Add MCP Server".
  4. Paste one of the configuration blocks shown above (Stdio or Streamable HTTP).
  5. Finalize by saving and activating the new server entry.
  6. Claude is now equipped to dispatch web queries, acquire data, and manipulate web elements utilizing Scrapeless capabilities.

Integration Guide: Cursor IDE

  1. Open the Cursor Integrated Development Environment.
  2. Invoke the command palette (Cmd + Shift + P) and locate: Configure MCP Servers.
  3. Insert the Scrapeless MCP configuration structure as demonstrated previously.
  4. Commit the changes and perform a software restart (if prompts suggest it).
  5. You can now issue contextual commands such as:
  6. "Look up solutions on StackOverflow related to this specific error code"
  7. "Extract the full source code from the current web link"
  8. These instructions will be transparently executed by the Scrapeless background service.

Supported MCP Toolset Overview

Tool Identifier Functionality Description
google_search Primary interface for universal web knowledge retrieval.
google_trends Accesses and reports on temporal search interest data.
browser_create Establishes or reclaims a dedicated, remote cloud browser session.
browser_close Terminates the active cloud browser context.
browser_goto Directs the browser instance to a specified Uniform Resource Locator.
browser_go_back Reverts the browser history by one step.
browser_go_forward Advances the browser history by one step.
browser_click Simulates a user click event on a designated page element.
browser_type Inputs textual data into a targeted form field.
browser_press_key Emulates the physical depression of a keyboard key.
browser_wait_for Pauses execution until a designated page component becomes visible.
browser_wait Inserts a fixed temporal delay into the execution flow.
browser_screenshot Generates a raster image snapshot of the current viewport.
browser_get_html Retrieves the complete, raw Document Object Model (DOM) source.
browser_get_text Extracts all discernible, visible textual strings from the page.
browser_scroll Scrolls the viewport to the absolute bottom boundary.
browser_scroll_to Moves a specific element into the immediate viewport.
scrape_html Executes a remote fetch and returns only the document's HTML.
scrape_markdown Fetches content and converts it into readable Markdown format.
scrape_screenshot Captures a high-fidelity visual representation of any remote webpage.

Security Directives and Safeguards

When integrating Scrapeless MCP Server with generative models (e.g., ChatGPT, Claude, Cursor), extreme diligence is required when managing all data acquired via web fetching or extraction. Content retrieved from the web must be treated as inherently untrusted, as misuse can lead to vulnerabilities like prompt injection or other systemic exploits.

  • Avoid direct injection of raw scraped material into LLM prompts. Raw HTML, embedded scripts, or user-supplied text might harbor concealed injection payloads.
  • Rigorously sanitize and authenticate all extracted artifacts. Remove or escape potentially malicious tags and executable code before passing data to subsequent logic or AI engines.
  • Prioritize explicit structural extraction over generalized text retrieval. Utilize targeted tools like scrape_html, scrape_markdown, or precisely selector-driven browser_get_text to limit data ingress to explicitly validated content sources.
  • Enforce source validation via domain or selector whitelisting when dealing with dynamically assembled web pages, restricting data provenance to known, secure origins.
  • Establish comprehensive logging and auditing for all external resource calls made by browser or scraping utilities, particularly when sensitive credentials or internal network pathways are involved.

🚫 Practices to Prohibit

  • Introducing unfiltered HTML snippets directly into instructional prompts.
  • Allowing end-users to specify arbitrary URLs or CSS selectors without prior validation checks.
  • Storing unverified, scraped content for indefinite retention and later re-use in prompt construction.

Community Engagement

Connect With Us

For technical queries, feature suggestions, or partnership opportunities, reach out via:

  • Electronic Mail: market@scrapeless.com
  • Official Web Presence: https://www.scrapeless.com
  • Collaborative Discussion Board: https://discord.gg/Np4CAHxB9a

REFERENCE: The XMLHttpRequest (XHR) is an established Application Programming Interface, manifested as a JavaScript object, whose core methods facilitate the submission of Hypertext Transfer Protocol requests from a client-side browser environment to a remote server. These methods permit web-based applications to initiate server communications subsequent to initial page rendering, allowing for asynchronous data retrieval. XHR is fundamental to the programming paradigm known as Ajax. Before Ajax gained prominence, server interaction relied predominantly on standard hyperlink navigation and form submissions, actions that typically resulted in a full page refresh.

See Also

`