Scrapeless Model Context Protocol (MCP) Gateway

Welcome to the official Scrapeless MCP Gateway—a robust orchestration layer facilitating LLMs, AI agents, and intelligent applications for dynamic, contemporaneous interaction with the internet.

Adhering to the open MCP specification, the Scrapeless Gateway seamlessly bridges models such as ChatGPT, Claude, and development environments like Cursor and Windsurf with extensive external capabilities, including:

Native integration with Google services (Search, Trends)
Automated browser execution for complex on-page navigation and manipulation
High-fidelity capture of content from JavaScript-heavy sites—outputting as raw HTML, Markdown, or visual screenshots

This server furnishes the necessary volatile context and live data required for advanced AI research assistants, coding copilots, or autonomous web operatives—all while employing evasion techniques to minimize service disruption (anti-blocking).

Operational Use Cases

Advanced Web Interaction via Claude utilizing Scrapeless Browser

Claude can execute multi-step operations—such as navigating, scrolling, and content extraction—through natural language commands, viewing the interaction results live via live sessions.

Bypassing Protective Measures (e.g., Cloudflare) for Content Acquisition

Leveraging the Scrapeless MCP Browser module, target pages protected by security measures are automatically accessed; upon completion, the requisite page content is retrieved and delivered formatted as Markdown.

Extracting Client-Side Rendered Content and Persisting to Disk

Utilizing the Scrapeless MCP Universal API, content that relies on JavaScript rendering is scraped, converted into Markdown format, and then written directly to a local file designated as text.md.

Automated Search Engine Results Page (SERP) Harvesting

Execute a query for the term “web scraping” via Google Search using the Scrapeless MCP Server, gather the top 10 result snippets (including URLs, titles, and summaries), and serialize this data into a file named serp.text.

Here are supplementary examples illustrating potential interactions:

Example Scenario
Initiate a broad query using Google Search via Scrapeless.
Determine the recent search interest trajectory for the term "AI" over the preceding twelve months.
Command a browser session to load chatgpt.com, execute an internal search for "What's the weather like today?", and synthesize the findings.
Obtain the complete HTML structure of the scrapeless.com resource.
Retrieve the cleaned Markdown representation of the scrapeless.com webpage.
Generate high-resolution visual captures (.png) of the scrapeless.com interface.

Configuration Procedure

Acquire a Scrapeless Credential
Access the Scrapeless Portal (Log in for registration—a trial period is active)
Navigate to "Setting" (Sidebar) → select "API Key Management" → initiate "Create API Key". Finally, select the newly generated key to copy the token.
Initialize the MCP Client Environment

Scrapeless MCP Server accommodates both Standard I/O (Stdio) and Streamable Hypertext Transfer Protocol (HTTP) connection methodologies.

🖥️ Stdio (Local Process Execution)

JSON { "mcpServers": { "Scrapeless MCP Server": { "command": "npx", "args": ["-y", "scrapeless-mcp-server"], "env": { "SCRAPELESS_KEY": "YOUR_SCRAPELESS_KEY" } } } }

🌐 Streamable HTTP (Remote API Mode)

JSON { "mcpServers": { "Scrapeless MCP Server": { "type": "streamable-http", "url": "https://api.scrapeless.com/mcp", "headers": { "x-api-token": "YOUR_SCRAPELESS_KEY" }, "disabled": false, "alwaysAllow": [] } } }

Extended Session Customization

Browser session characteristics can be finely tuned using supplementary directives, provided either as environment variables (for Stdio) or specific HTTP request headers (for Streamable HTTP):

Stdio Configuration (Environment Variable)	Streamable HTTP (Header Field)	Purpose of Setting
BROWSER_PROFILE_ID	x-browser-profile-id	Designates a stored browser persona for stateful session continuity.
BROWSER_PROFILE_PERSIST	x-browser-profile-persist	Activates the saving of session artifacts like cookies and local storage across invocations.
BROWSER_SESSION_TTL	x-browser-session-ttl	Dictates the maximum permissible idle duration (in seconds) before an active session is automatically terminated.

Integration Guide: Claude Desktop Application

Launch the Claude Desktop interface.
Navigate the settings path: Settings → Tools → MCP Servers.
Initiate the addition process by clicking "Add MCP Server".
Paste one of the configuration blocks shown above (Stdio or Streamable HTTP).
Finalize by saving and activating the new server entry.
Claude is now equipped to dispatch web queries, acquire data, and manipulate web elements utilizing Scrapeless capabilities.

Integration Guide: Cursor IDE

Open the Cursor Integrated Development Environment.
Invoke the command palette (Cmd + Shift + P) and locate: Configure MCP Servers.
Insert the Scrapeless MCP configuration structure as demonstrated previously.
Commit the changes and perform a software restart (if prompts suggest it).
You can now issue contextual commands such as:
"Look up solutions on StackOverflow related to this specific error code"
"Extract the full source code from the current web link"
These instructions will be transparently executed by the Scrapeless background service.

Supported MCP Toolset Overview

Tool Identifier	Functionality Description
google_search	Primary interface for universal web knowledge retrieval.
google_trends	Accesses and reports on temporal search interest data.
browser_create	Establishes or reclaims a dedicated, remote cloud browser session.
browser_close	Terminates the active cloud browser context.
browser_goto	Directs the browser instance to a specified Uniform Resource Locator.
browser_go_back	Reverts the browser history by one step.
browser_go_forward	Advances the browser history by one step.
browser_click	Simulates a user click event on a designated page element.
browser_type	Inputs textual data into a targeted form field.
browser_press_key	Emulates the physical depression of a keyboard key.
browser_wait_for	Pauses execution until a designated page component becomes visible.
browser_wait	Inserts a fixed temporal delay into the execution flow.
browser_screenshot	Generates a raster image snapshot of the current viewport.
browser_get_html	Retrieves the complete, raw Document Object Model (DOM) source.
browser_get_text	Extracts all discernible, visible textual strings from the page.
browser_scroll	Scrolls the viewport to the absolute bottom boundary.
browser_scroll_to	Moves a specific element into the immediate viewport.
scrape_html	Executes a remote fetch and returns only the document's HTML.
scrape_markdown	Fetches content and converts it into readable Markdown format.
scrape_screenshot	Captures a high-fidelity visual representation of any remote webpage.

Security Directives and Safeguards

When integrating Scrapeless MCP Server with generative models (e.g., ChatGPT, Claude, Cursor), extreme diligence is required when managing all data acquired via web fetching or extraction. Content retrieved from the web must be treated as inherently untrusted, as misuse can lead to vulnerabilities like prompt injection or other systemic exploits.

✅ Recommended Protocols

Avoid direct injection of raw scraped material into LLM prompts. Raw HTML, embedded scripts, or user-supplied text might harbor concealed injection payloads.
Rigorously sanitize and authenticate all extracted artifacts. Remove or escape potentially malicious tags and executable code before passing data to subsequent logic or AI engines.
Prioritize explicit structural extraction over generalized text retrieval. Utilize targeted tools like scrape_html, scrape_markdown, or precisely selector-driven browser_get_text to limit data ingress to explicitly validated content sources.
Enforce source validation via domain or selector whitelisting when dealing with dynamically assembled web pages, restricting data provenance to known, secure origins.
Establish comprehensive logging and auditing for all external resource calls made by browser or scraping utilities, particularly when sensitive credentials or internal network pathways are involved.

🚫 Practices to Prohibit

Introducing unfiltered HTML snippets directly into instructional prompts.
Allowing end-users to specify arbitrary URLs or CSS selectors without prior validation checks.
Storing unverified, scraped content for indefinite retention and later re-use in prompt construction.

Community Engagement

Join the centralized MCP Server support channel on Discord (https://backend.scrapeless.com/app/api/v1/public/links/discord)

Connect With Us

For technical queries, feature suggestions, or partnership opportunities, reach out via:

Electronic Mail: market@scrapeless.com
Official Web Presence: https://www.scrapeless.com
Collaborative Discussion Board: https://discord.gg/Np4CAHxB9a

REFERENCE: The XMLHttpRequest (XHR) is an established Application Programming Interface, manifested as a JavaScript object, whose core methods facilitate the submission of Hypertext Transfer Protocol requests from a client-side browser environment to a remote server. These methods permit web-based applications to initiate server communications subsequent to initial page rendering, allowing for asynchronous data retrieval. XHR is fundamental to the programming paradigm known as Ajax. Before Ajax gained prominence, server interaction relied predominantly on standard hyperlink navigation and form submissions, actions that typically resulted in a full page refresh.