Scrapeless Web Context Provider (MCP Server Implementation)

Welcome to the official Scrapeless Model Context Protocol (MCP) Server — this represents a robust integration intermediary that facilitates real-time web interaction for Large Language Models (LLMs), intelligent agents, and bespoke AI workflows.

Adhering strictly to the open MCP specification, the Scrapeless Server establishes seamless connectivity between leading models (such as ChatGPT, Claude) and sophisticated tooling (like Cursor, Windsurf) with a comprehensive suite of external data retrieval mechanisms, including:

Deep integration with Google ecosystem (Search queries, Trend analysis)
Advanced browser emulation for deep page navigation and interaction
Capability to extract content from highly dynamic, JavaScript-intensive websites—output formats include raw HTML, clean Markdown, or static image captures.

This platform is invaluable for developing next-generation AI research aids, coding assistants, or fully autonomous web-operating agents, ensuring they possess the dynamic, current data necessary for complex tasks—all while employing evasion techniques to minimize operational blocking.

Illustrative Use Cases

Conversational Web Manipulation and Data Harvesting via Claude

The Scrapeless MCP Browser component enables models like Claude to execute intricate sequences: navigating to web locations, simulating user clicks, performing scrolls, and extracting specific content, presenting interactive feedback through its live sessions feature.

Circumventing Protections (e.g., Cloudflare) to Obtain Target Page Body

By leveraging the Scrapeless MCP Browser service, protected pages (like those behind Cloudflare challenges) are automatically accessed. Upon successful traversal, the fully rendered page content is securely extracted and delivered, commonly formatted as Markdown.

Extracting Client-Side Rendered Content and Persisting to Disk

Utilizing the Scrapeless MCP Universal API, content rendered entirely by client-side JavaScript is scraped, converted into Markdown format, and subsequently written to a persistent local file, exemplified here as text.md.

Automated SERP Index Harvesting

Employing the Scrapeless MCP Server, a query for the term “web scraping” is executed against Google Search. The first ten resulting entries (including titles, URLs, and descriptive snippets) are retrieved and persisted into the file named serp.text.

Here are several more operational scenarios:

Operational Scenario
Perform a web lookup using the Google search mechanism via Scrapeless.
Ascertain the temporal search interest metrics for the term "AI" across the preceding twelve months.
Initiate a browser session to access chatgpt.com, execute an internal query for "What's the weather like today?", and synthesize the findings.
Retrieve and output the raw underlying HTML structure of the scrapeless.com webpage.
Retrieve and output the content of the scrapeless.com webpage formatted cleanly as Markdown.
Generate high-fidelity visual renderings (screenshots) of the scrapeless.com destination.

Deployment Protocol

Securing the Authorization Token
Access the Scrapeless Portal: Log in (A trial period is available).
Navigate to the "Setting" menu on the sidebar.
Select "API Key Management".
Initiate the creation of a new API Key.
Click on the newly generated key to copy the credential.
Configuring the MCP Client Endpoint

Scrapeless MCP Server supports two primary communication pathways: Standard Input/Output (Stdio) for local execution and Streamable HTTP for remote API interaction.

🖥️ Stdio Mode (Local Machine Execution)

JSON { "mcpServers": { "Scrapeless MCP Server": { "command": "npx", "args": ["-y", "scrapeless-mcp-server"], "env": { "SCRAPELESS_KEY": "YOUR_SCRAPELESS_KEY" } } } }

🌐 Streamable HTTP Mode (Cloud API Hosting)

JSON { "mcpServers": { "Scrapeless MCP Server": { "type": "streamable-http", "url": "https://api.scrapeless.com/mcp", "headers": { "x-api-token": "YOUR_SCRAPELESS_KEY" }, "disabled": false, "alwaysAllow": [] } } }

Tailoring Browser Session Parameters

Runtime characteristics of the browser sessions can be fine-tuned using optional parameters, applied either through environment variables (Stdio) or specific HTTP request headers (Streamable HTTP):

Stdio Configuration (Environment Variable)	Streamable HTTP Header Name	Functional Description
BROWSER_PROFILE_ID	x-browser-profile-id	Designates a pre-existing browser profile for maintaining session state across calls.
BROWSER_PROFILE_PERSIST	x-browser-profile-persist	Activates persistent storage mechanisms (cookies, local data) for the duration of the profile's use.
BROWSER_SESSION_TTL	x-browser-session-ttl	Stipulates the maximum lifespan, in seconds, before an inactive browser session is automatically terminated.

Integration Guide: Claude Desktop Environment

Initiate the Claude Desktop application.
Navigate through the settings path: Settings → Tools → MCP Servers.
Select the option to "Add MCP Server".
Input one of the configuration blocks (Stdio or Streamable HTTP) provided above.
Finalize the setup by saving and toggling the server to the 'enabled' state.
Claude is now authorized to issue web retrieval requests and manage page interactions using the Scrapeless backend.

Integration Guide: Cursor IDE

Launch the Cursor Integrated Development Environment.
Activate the command palette: Cmd + Shift + P, and search for the command: Configure MCP Servers.
Insert the Scrapeless MCP configuration structure into the relevant settings file.
Save the configuration file, potentially requiring a restart of the Cursor application.
You can now prompt Cursor with directives such as:
"Investigate StackOverflow for the root cause of this execution exception"
"Extract the full markup from the currently viewed URL endpoint."
These commands will be transparently processed by the Scrapeless infrastructure.

Supported MCP Toolset

Tool Name	Purpose Description
google_search	Primary interface for generalized, universal information retrieval.
google_trends	Accessing and retrieving dynamic search interest data from Google Trends.
browser_create	Provisioning or reclaiming a virtual browser instance via Scrapeless.
browser_close	Terminates the active virtual browser session connection.
browser_goto	Directs the virtual browser viewport to a specified Uniform Resource Locator.
browser_go_back	Reverts the browser's current viewport state by one historical step.
browser_go_forward	Advances the browser's current viewport state by one historical step.
browser_click	Simulates a mouse click event on a designated page element.
browser_type	Inputs textual data into a targeted form field or input area.
browser_press_key	Emulates the physical depression of a specified keyboard key.
browser_wait_for	Pauses execution until a prerequisite page element becomes visually present.
browser_wait	Imposes a fixed, time-based delay on the execution flow.
browser_screenshot	Captures an image of the current state of the browser viewport.
browser_get_html	Fetches the entirety of the Document Object Model (DOM) source code.
browser_get_text	Extracts all visually discernible textual content from the active page.
browser_scroll	Executes a scroll action to bring the page's terminus into view.
browser_scroll_to	Programmatically positions a specific element into the visible viewport area.
scrape_html	Executes a scrape operation on a URL and returns the raw HTML source.
scrape_markdown	Executes a scrape operation and converts the resulting content to clean Markdown.
scrape_screenshot	Generates a high-fidelity static image capture of any specified web address.

Security Considerations and Guidelines

When integrating the Scrapeless MCP Server with autonomous systems (such as ChatGPT, Claude, or Cursor), the integrity of all extracted or scraped web artifacts must be rigorously managed. Data sourced from the internet is inherently untrustworthy by default, and its unvalidated injection into AI prompts can lead to security exposures, including prompt injection vectors.

✅ Recommended Mitigation Strategies

Never feed raw, uncurated web artifacts directly into LLM prompts. Source materials such as unparsed HTML, executable JavaScript, or unvalidated user-supplied text might conceal malicious payload instructions.
Implement stringent sanitization and validation routines for all derived data. Harmful constructs, like malicious script tags, must be stripped or properly escaped before usage in subsequent AI logic.
Favor controlled, structured data extraction over general text consumption. Utilize specific tools like scrape_html, scrape_markdown, or precise browser_get_text calls combined with secure CSS selectors to limit data intake to known, verified content segments.
Enforce domain-level or selector-level whitelisting when dealing with dynamically generated web pages, ensuring data ingress is restricted solely to reputable and validated sources.
Maintain comprehensive logging of all external network activities initiated by the browser or scraping tools, especially where sensitive internal tokens or proprietary network access is involved.

🚫 Actions to Strictly Prohibit

Direct incorporation of scraped HTML markup into conversational prompts.
Permitting end-users to define arbitrary target URLs or CSS selectors without pre-validation.
Storing unfiltered, raw scraped information for reuse in future conversational contexts.

Community Engagement

Connect with the broader MCP Server ecosystem via Discord: MCP Server Discord

Contact Channels

For inquiries regarding feature requests, technical support, or partnership opportunities, please utilize these channels:

Electronic Mail: market@scrapeless.com
Primary Web Portal: https://www.scrapeless.com
Discussion Forum: https://discord.gg/Np4CAHxB9a

scrapeless-web-context-provider

Author

scrapeless-ai

Quick Info

Actions

Tags