scrapeless-mcp-hub
A centralized protocol server designed to fetch and synthesize real-time information from Google's ecosystem and dynamic web pages, thereby enriching AI agent reasoning and contextual awareness.
Author

scrapeless-ai
Quick Info
Actions
Tags
Scrapeless Model Context Protocol (MCP) Gateway
Welcome to the official Scrapeless MCP Gateway—a robust orchestration layer facilitating LLMs, AI agents, and intelligent applications for dynamic, contemporaneous interaction with the internet.
Adhering to the open MCP specification, the Scrapeless Gateway seamlessly bridges models such as ChatGPT, Claude, and development environments like Cursor and Windsurf with extensive external capabilities, including:
- Native integration with Google services (Search, Trends)
- Automated browser execution for complex on-page navigation and manipulation
- High-fidelity capture of content from JavaScript-heavy sites—outputting as raw HTML, Markdown, or visual screenshots
This server furnishes the necessary volatile context and live data required for advanced AI research assistants, coding copilots, or autonomous web operatives—all while employing evasion techniques to minimize service disruption (anti-blocking).
Operational Use Cases
- Advanced Web Interaction via Claude utilizing Scrapeless Browser
Claude can execute multi-step operations—such as navigating, scrolling, and content extraction—through natural language commands, viewing the interaction results live via live sessions.
- Bypassing Protective Measures (e.g., Cloudflare) for Content Acquisition
Leveraging the Scrapeless MCP Browser module, target pages protected by security measures are automatically accessed; upon completion, the requisite page content is retrieved and delivered formatted as Markdown.
- Extracting Client-Side Rendered Content and Persisting to Disk
Utilizing the Scrapeless MCP Universal API, content that relies on JavaScript rendering is scraped, converted into Markdown format, and then written directly to a local file designated as text.md.
- Automated Search Engine Results Page (SERP) Harvesting
Execute a query for the term “web scraping” via Google Search using the Scrapeless MCP Server, gather the top 10 result snippets (including URLs, titles, and summaries), and serialize this data into a file named serp.text.
Here are supplementary examples illustrating potential interactions:
| Example Scenario |
|---|
| Initiate a broad query using Google Search via Scrapeless. |
| Determine the recent search interest trajectory for the term "AI" over the preceding twelve months. |
| Command a browser session to load chatgpt.com, execute an internal search for "What's the weather like today?", and synthesize the findings. |
| Obtain the complete HTML structure of the scrapeless.com resource. |
| Retrieve the cleaned Markdown representation of the scrapeless.com webpage. |
| Generate high-resolution visual captures (.png) of the scrapeless.com interface. |
Configuration Procedure
-
Acquire a Scrapeless Credential
-
Access the Scrapeless Portal (Log in for registration—a trial period is active)
-
Navigate to "Setting" (Sidebar) → select "API Key Management" → initiate "Create API Key". Finally, select the newly generated key to copy the token.
-
Initialize the MCP Client Environment
Scrapeless MCP Server accommodates both Standard I/O (Stdio) and Streamable Hypertext Transfer Protocol (HTTP) connection methodologies.
🖥️ Stdio (Local Process Execution)
JSON { "mcpServers": { "Scrapeless MCP Server": { "command": "npx", "args": ["-y", "scrapeless-mcp-server"], "env": { "SCRAPELESS_KEY": "YOUR_SCRAPELESS_KEY" } } } }
🌐 Streamable HTTP (Remote API Mode)
JSON { "mcpServers": { "Scrapeless MCP Server": { "type": "streamable-http", "url": "https://api.scrapeless.com/mcp", "headers": { "x-api-token": "YOUR_SCRAPELESS_KEY" }, "disabled": false, "alwaysAllow": [] } } }
Extended Session Customization
Browser session characteristics can be finely tuned using supplementary directives, provided either as environment variables (for Stdio) or specific HTTP request headers (for Streamable HTTP):
| Stdio Configuration (Environment Variable) | Streamable HTTP (Header Field) | Purpose of Setting |
|---|---|---|
| BROWSER_PROFILE_ID | x-browser-profile-id | Designates a stored browser persona for stateful session continuity. |
| BROWSER_PROFILE_PERSIST | x-browser-profile-persist | Activates the saving of session artifacts like cookies and local storage across invocations. |
| BROWSER_SESSION_TTL | x-browser-session-ttl | Dictates the maximum permissible idle duration (in seconds) before an active session is automatically terminated. |
Integration Guide: Claude Desktop Application
- Launch the Claude Desktop interface.
- Navigate the settings path:
Settings→Tools→MCP Servers. - Initiate the addition process by clicking "Add MCP Server".
- Paste one of the configuration blocks shown above (Stdio or Streamable HTTP).
- Finalize by saving and activating the new server entry.
- Claude is now equipped to dispatch web queries, acquire data, and manipulate web elements utilizing Scrapeless capabilities.
Integration Guide: Cursor IDE
- Open the Cursor Integrated Development Environment.
- Invoke the command palette (
Cmd + Shift + P) and locate:Configure MCP Servers. - Insert the Scrapeless MCP configuration structure as demonstrated previously.
- Commit the changes and perform a software restart (if prompts suggest it).
- You can now issue contextual commands such as:
"Look up solutions on StackOverflow related to this specific error code""Extract the full source code from the current web link"- These instructions will be transparently executed by the Scrapeless background service.
Supported MCP Toolset Overview
| Tool Identifier | Functionality Description |
|---|---|
| google_search | Primary interface for universal web knowledge retrieval. |
| google_trends | Accesses and reports on temporal search interest data. |
| browser_create | Establishes or reclaims a dedicated, remote cloud browser session. |
| browser_close | Terminates the active cloud browser context. |
| browser_goto | Directs the browser instance to a specified Uniform Resource Locator. |
| browser_go_back | Reverts the browser history by one step. |
| browser_go_forward | Advances the browser history by one step. |
| browser_click | Simulates a user click event on a designated page element. |
| browser_type | Inputs textual data into a targeted form field. |
| browser_press_key | Emulates the physical depression of a keyboard key. |
| browser_wait_for | Pauses execution until a designated page component becomes visible. |
| browser_wait | Inserts a fixed temporal delay into the execution flow. |
| browser_screenshot | Generates a raster image snapshot of the current viewport. |
| browser_get_html | Retrieves the complete, raw Document Object Model (DOM) source. |
| browser_get_text | Extracts all discernible, visible textual strings from the page. |
| browser_scroll | Scrolls the viewport to the absolute bottom boundary. |
| browser_scroll_to | Moves a specific element into the immediate viewport. |
| scrape_html | Executes a remote fetch and returns only the document's HTML. |
| scrape_markdown | Fetches content and converts it into readable Markdown format. |
| scrape_screenshot | Captures a high-fidelity visual representation of any remote webpage. |
Security Directives and Safeguards
When integrating Scrapeless MCP Server with generative models (e.g., ChatGPT, Claude, Cursor), extreme diligence is required when managing all data acquired via web fetching or extraction. Content retrieved from the web must be treated as inherently untrusted, as misuse can lead to vulnerabilities like prompt injection or other systemic exploits.
✅ Recommended Protocols
- Avoid direct injection of raw scraped material into LLM prompts. Raw HTML, embedded scripts, or user-supplied text might harbor concealed injection payloads.
- Rigorously sanitize and authenticate all extracted artifacts. Remove or escape potentially malicious tags and executable code before passing data to subsequent logic or AI engines.
- Prioritize explicit structural extraction over generalized text retrieval. Utilize targeted tools like
scrape_html,scrape_markdown, or precisely selector-drivenbrowser_get_textto limit data ingress to explicitly validated content sources. - Enforce source validation via domain or selector whitelisting when dealing with dynamically assembled web pages, restricting data provenance to known, secure origins.
- Establish comprehensive logging and auditing for all external resource calls made by browser or scraping utilities, particularly when sensitive credentials or internal network pathways are involved.
🚫 Practices to Prohibit
- Introducing unfiltered HTML snippets directly into instructional prompts.
- Allowing end-users to specify arbitrary URLs or CSS selectors without prior validation checks.
- Storing unverified, scraped content for indefinite retention and later re-use in prompt construction.
Community Engagement
- Join the centralized MCP Server support channel on Discord (https://backend.scrapeless.com/app/api/v1/public/links/discord)
Connect With Us
For technical queries, feature suggestions, or partnership opportunities, reach out via:
- Electronic Mail: market@scrapeless.com
- Official Web Presence: https://www.scrapeless.com
- Collaborative Discussion Board: https://discord.gg/Np4CAHxB9a
REFERENCE: The XMLHttpRequest (XHR) is an established Application Programming Interface, manifested as a JavaScript object, whose core methods facilitate the submission of Hypertext Transfer Protocol requests from a client-side browser environment to a remote server. These methods permit web-based applications to initiate server communications subsequent to initial page rendering, allowing for asynchronous data retrieval. XHR is fundamental to the programming paradigm known as Ajax. Before Ajax gained prominence, server interaction relied predominantly on standard hyperlink navigation and form submissions, actions that typically resulted in a full page refresh.
