scrapeless-web-context-provider
A specialized backend service engineered to fetch and synthesize information retrieved from Google Search indices and associated Google properties. This capability is essential for augmenting Artificial Intelligence models, injecting current, real-world situational awareness into their operative logic and research functionalities.
Author

scrapeless-ai
Quick Info
Actions
Tags
Scrapeless Web Context Provider (MCP Server Implementation)
Welcome to the official Scrapeless Model Context Protocol (MCP) Server — this represents a robust integration intermediary that facilitates real-time web interaction for Large Language Models (LLMs), intelligent agents, and bespoke AI workflows.
Adhering strictly to the open MCP specification, the Scrapeless Server establishes seamless connectivity between leading models (such as ChatGPT, Claude) and sophisticated tooling (like Cursor, Windsurf) with a comprehensive suite of external data retrieval mechanisms, including:
- Deep integration with Google ecosystem (Search queries, Trend analysis)
- Advanced browser emulation for deep page navigation and interaction
- Capability to extract content from highly dynamic, JavaScript-intensive websites—output formats include raw HTML, clean Markdown, or static image captures.
This platform is invaluable for developing next-generation AI research aids, coding assistants, or fully autonomous web-operating agents, ensuring they possess the dynamic, current data necessary for complex tasks—all while employing evasion techniques to minimize operational blocking.
Illustrative Use Cases
- Conversational Web Manipulation and Data Harvesting via Claude
The Scrapeless MCP Browser component enables models like Claude to execute intricate sequences: navigating to web locations, simulating user clicks, performing scrolls, and extracting specific content, presenting interactive feedback through its live sessions feature.
- Circumventing Protections (e.g., Cloudflare) to Obtain Target Page Body
By leveraging the Scrapeless MCP Browser service, protected pages (like those behind Cloudflare challenges) are automatically accessed. Upon successful traversal, the fully rendered page content is securely extracted and delivered, commonly formatted as Markdown.
- Extracting Client-Side Rendered Content and Persisting to Disk
Utilizing the Scrapeless MCP Universal API, content rendered entirely by client-side JavaScript is scraped, converted into Markdown format, and subsequently written to a persistent local file, exemplified here as text.md.
- Automated SERP Index Harvesting
Employing the Scrapeless MCP Server, a query for the term “web scraping” is executed against Google Search. The first ten resulting entries (including titles, URLs, and descriptive snippets) are retrieved and persisted into the file named serp.text.
Here are several more operational scenarios:
| Operational Scenario |
|---|
| Perform a web lookup using the Google search mechanism via Scrapeless. |
| Ascertain the temporal search interest metrics for the term "AI" across the preceding twelve months. |
| Initiate a browser session to access chatgpt.com, execute an internal query for "What's the weather like today?", and synthesize the findings. |
| Retrieve and output the raw underlying HTML structure of the scrapeless.com webpage. |
| Retrieve and output the content of the scrapeless.com webpage formatted cleanly as Markdown. |
| Generate high-fidelity visual renderings (screenshots) of the scrapeless.com destination. |
Deployment Protocol
-
Securing the Authorization Token
-
Access the Scrapeless Portal: Log in (A trial period is available).
- Navigate to the "Setting" menu on the sidebar.
- Select "API Key Management".
- Initiate the creation of a new API Key.
-
Click on the newly generated key to copy the credential.
-
Configuring the MCP Client Endpoint
Scrapeless MCP Server supports two primary communication pathways: Standard Input/Output (Stdio) for local execution and Streamable HTTP for remote API interaction.
🖥️ Stdio Mode (Local Machine Execution)
JSON { "mcpServers": { "Scrapeless MCP Server": { "command": "npx", "args": ["-y", "scrapeless-mcp-server"], "env": { "SCRAPELESS_KEY": "YOUR_SCRAPELESS_KEY" } } } }
🌐 Streamable HTTP Mode (Cloud API Hosting)
JSON { "mcpServers": { "Scrapeless MCP Server": { "type": "streamable-http", "url": "https://api.scrapeless.com/mcp", "headers": { "x-api-token": "YOUR_SCRAPELESS_KEY" }, "disabled": false, "alwaysAllow": [] } } }
Tailoring Browser Session Parameters
Runtime characteristics of the browser sessions can be fine-tuned using optional parameters, applied either through environment variables (Stdio) or specific HTTP request headers (Streamable HTTP):
| Stdio Configuration (Environment Variable) | Streamable HTTP Header Name | Functional Description |
|---|---|---|
| BROWSER_PROFILE_ID | x-browser-profile-id | Designates a pre-existing browser profile for maintaining session state across calls. |
| BROWSER_PROFILE_PERSIST | x-browser-profile-persist | Activates persistent storage mechanisms (cookies, local data) for the duration of the profile's use. |
| BROWSER_SESSION_TTL | x-browser-session-ttl | Stipulates the maximum lifespan, in seconds, before an inactive browser session is automatically terminated. |
Integration Guide: Claude Desktop Environment
- Initiate the Claude Desktop application.
- Navigate through the settings path:
Settings→Tools→MCP Servers. - Select the option to "Add MCP Server".
- Input one of the configuration blocks (Stdio or Streamable HTTP) provided above.
- Finalize the setup by saving and toggling the server to the 'enabled' state.
- Claude is now authorized to issue web retrieval requests and manage page interactions using the Scrapeless backend.
Integration Guide: Cursor IDE
- Launch the Cursor Integrated Development Environment.
- Activate the command palette:
Cmd + Shift + P, and search for the command:Configure MCP Servers. - Insert the Scrapeless MCP configuration structure into the relevant settings file.
- Save the configuration file, potentially requiring a restart of the Cursor application.
- You can now prompt Cursor with directives such as:
"Investigate StackOverflow for the root cause of this execution exception""Extract the full markup from the currently viewed URL endpoint."- These commands will be transparently processed by the Scrapeless infrastructure.
Supported MCP Toolset
| Tool Name | Purpose Description |
|---|---|
| google_search | Primary interface for generalized, universal information retrieval. |
| google_trends | Accessing and retrieving dynamic search interest data from Google Trends. |
| browser_create | Provisioning or reclaiming a virtual browser instance via Scrapeless. |
| browser_close | Terminates the active virtual browser session connection. |
| browser_goto | Directs the virtual browser viewport to a specified Uniform Resource Locator. |
| browser_go_back | Reverts the browser's current viewport state by one historical step. |
| browser_go_forward | Advances the browser's current viewport state by one historical step. |
| browser_click | Simulates a mouse click event on a designated page element. |
| browser_type | Inputs textual data into a targeted form field or input area. |
| browser_press_key | Emulates the physical depression of a specified keyboard key. |
| browser_wait_for | Pauses execution until a prerequisite page element becomes visually present. |
| browser_wait | Imposes a fixed, time-based delay on the execution flow. |
| browser_screenshot | Captures an image of the current state of the browser viewport. |
| browser_get_html | Fetches the entirety of the Document Object Model (DOM) source code. |
| browser_get_text | Extracts all visually discernible textual content from the active page. |
| browser_scroll | Executes a scroll action to bring the page's terminus into view. |
| browser_scroll_to | Programmatically positions a specific element into the visible viewport area. |
| scrape_html | Executes a scrape operation on a URL and returns the raw HTML source. |
| scrape_markdown | Executes a scrape operation and converts the resulting content to clean Markdown. |
| scrape_screenshot | Generates a high-fidelity static image capture of any specified web address. |
Security Considerations and Guidelines
When integrating the Scrapeless MCP Server with autonomous systems (such as ChatGPT, Claude, or Cursor), the integrity of all extracted or scraped web artifacts must be rigorously managed. Data sourced from the internet is inherently untrustworthy by default, and its unvalidated injection into AI prompts can lead to security exposures, including prompt injection vectors.
✅ Recommended Mitigation Strategies
- Never feed raw, uncurated web artifacts directly into LLM prompts. Source materials such as unparsed HTML, executable JavaScript, or unvalidated user-supplied text might conceal malicious payload instructions.
- Implement stringent sanitization and validation routines for all derived data. Harmful constructs, like malicious script tags, must be stripped or properly escaped before usage in subsequent AI logic.
- Favor controlled, structured data extraction over general text consumption. Utilize specific tools like
scrape_html,scrape_markdown, or precisebrowser_get_textcalls combined with secure CSS selectors to limit data intake to known, verified content segments. - Enforce domain-level or selector-level whitelisting when dealing with dynamically generated web pages, ensuring data ingress is restricted solely to reputable and validated sources.
- Maintain comprehensive logging of all external network activities initiated by the browser or scraping tools, especially where sensitive internal tokens or proprietary network access is involved.
🚫 Actions to Strictly Prohibit
- Direct incorporation of scraped HTML markup into conversational prompts.
- Permitting end-users to define arbitrary target URLs or CSS selectors without pre-validation.
- Storing unfiltered, raw scraped information for reuse in future conversational contexts.
Community Engagement
- Connect with the broader MCP Server ecosystem via Discord: MCP Server Discord
Contact Channels
For inquiries regarding feature requests, technical support, or partnership opportunities, please utilize these channels:
- Electronic Mail: market@scrapeless.com
- Primary Web Portal: https://www.scrapeless.com
- Discussion Forum: https://discord.gg/Np4CAHxB9a
