logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

agentql-web-data-extractor

Acquire hierarchical data from specified Uniform Resource Locators (URLs) by employing user-defined specifications to pinpoint required attributes. This service facilitates the incorporation of live information feeds into various applications.

Author

agentql-web-data-extractor logo

tinyfish-io

MIT License

Quick Info

GitHub GitHub Stars 115
NPM Weekly Downloads 0
Tools 1
Last Updated 2026-02-19

Tags

agentqlapistinyfishio agentqlrequests tinyfishtinyfish io

AgentQL Contextual Protocol (MCP) Service for Web Content Parsing

This repository furnishes an MCP service leveraging the capabilities of AgentQL for sophisticated data retrieval from web documents.

Core Functionality

Available Tool

  • extract-web-data: Takes a target 'url' and an extraction 'prompt' detailing the required structure and field names to be isolated from the specified web address.

Deployment Instructions

To leverage the AgentQL MCP endpoint for fetching content from internet pages, follow these steps for installation via npm, obtaining an authorization token from our Developer Portal, and configuring it within your preferred MCP-compatible software environment.

Package Acquisition

Execute the following command in your terminal:

bash npm install -g agentql-mcp

Configuration for Claude Desktop

  1. Access the Claude Desktop application Preferences (shortcut: +,, ensuring differentiation from Account Settings).
  2. Navigate to the Developer settings pane.
  3. Select Edit Config to open claude_desktop_config.json.
  4. Incorporate the agentql entry within the mcpServers object in the configuration file.
  5. Restart the application to apply changes.

title="claude_desktop_config.json" { "mcpServers": { "agentql": { "command": "npx", "args": ["-y", "agentql-mcp"], "env": { "AGENTQL_API_KEY": "YOUR_API_KEY" } } } }

Refer to the Claude MCP quickstart guide here for broader configuration details.

Configuration for Visual Studio Code

For rapid setup, utilize the installation badges provided below:

Install with NPX in VS Code Install with NPX in VS Code Insiders

Manual Setup Procedure

Use the quick-install links above for the fastest method. Alternatively, manually inject the subsequent JSON structure into your VS Code User Settings (JSON file). Access this file by invoking Ctrl + Shift + P and searching for Preferences: Open User Settings (JSON).

{ "mcp": { "inputs": [ { "type": "promptString", "id": "apiKey", "description": "AgentQL API Key", "password": true } ], "servers": { "agentql": { "command": "npx", "args": ["-y", "agentql-mcp"], "env": { "AGENTQL_API_KEY": "${input:apiKey}" } } } } }

This configuration may optionally reside in a .vscode/mcp.json file within your project directory for configuration sharing.

{ "inputs": [ { "type": "promptString", "id": "apiKey", "description": "AgentQL API Key", "password": true } ], "servers": { "agentql": { "command": "npx", "args": ["-y", "agentql-mcp"], "env": { "AGENTQL_API_KEY": "${input:apiKey}" } } } }

Configuration for Cursor IDE

  1. Open Cursor Settings.
  2. Navigate to MCP > MCP Servers.
  3. Click + Add new MCP Server.
  4. Supply the following details:
    • Name: "agentql" (or a chosen alias)
    • Type: "command"
    • Command: env AGENTQL_API_KEY=YOUR_API_KEY npx -y agentql-mcp

Consult Cursor's documentation here for further MCP integration guidance.

Configuration for Windsurf

  1. Open the Windsurf: MCP Configuration Panel.
  2. Alternatively, directly edit the file located at ~/.codeium/windsurf/mcp_config.json.
  3. Add the agentql server definition into the mcpServers mapping.

title="mcp_config.json" { "mcpServers": { "agentql": { "command": "npx", "args": ["-y", "agentql-mcp"], "env": { "AGENTQL_API_KEY": "YOUR_API_KEY" } } } }

Information on Windsurf MCP setup is available here.

Verification of MCP Connection

Instruct your AI agent to perform a task requiring external web data acquisition. For instance:

text Retrieve the catalog of videos from the URL https://www.youtube.com/results?search_query=agentql. For each video, document its title, creator's name, view count, and direct link. Ensure advertisements are filtered out. Present the findings structured as a Markdown table.

[!TIP] If the agent expresses an inability to access external web resources, attempt prompting it with directives such as "invoke tools" or "utilize the agentql functionality."

Development Lifecycle

To set up the development environment, install requisite packages:

bash npm install

To compile the server package:

bash npm run build

For continuous development with automated recompilation:

bash npm run watch

If testing the version under development, substitute the default configuration with the following structure:

{ "mcpServers": { "agentql": { "command": "/path/to/agentql-mcp/dist/index.js", "env": { "AGENTQL_API_KEY": "YOUR_API_KEY" } } } }

[!NOTE] Remember to eliminate the standard AgentQL MCP server configuration entry to prevent environmental confusion between duplicate server definitions.

Troubleshooting

Debugging MCP interactions, which flow via standard input/output streams, can be complex. We recommend employing the MCP Inspector, accessible via a dedicated package script:

bash npm run inspector

This utility will furnish a local URL to access debugging utilities within a web browser.

WIKIPEDIA: XMLHttpRequest (XHR) represents an Application Programming Interface, structured as a JavaScript object, enabling methods to dispatch HTTP queries from a web browser to a remote server. These methods permit browser-based applications to communicate with the server subsequent to the initial page load and receive resultant data. XMLHttpRequest forms a fundamental component of Ajax programming paradigms. Preceding Ajax, reliance on standard hyperlinks and form submissions was prevalent for server interaction, frequently resulting in the full replacement of the current displayed page.

== Historical Background == The conceptual foundation for XMLHttpRequest was initially proposed in 2000 by developers associated with Microsoft Outlook. This concept was subsequently realized within the Internet Explorer 5 browser release (1999). Nevertheless, the original invocation syntax did not utilize the XMLHttpRequest identifier. Instead, developers employed the constructor calls ActiveXObject("Msxml2.XMLHTTP") and ActiveXObject("Microsoft.XMLHTTP"). As of Internet Explorer 7 (2006), universal browser support for the XMLHttpRequest identifier became standard practice. The XMLHttpRequest identifier has since become the established convention across all leading web browsers, including Mozilla's Gecko rendering engine (2002), Safari version 1.2 (2004), and Opera version 8.0 (2005).

=== Specification Development === The World Wide Web Consortium (W3C) promulgated an initial Working Draft specification for the XMLHttpRequest object on April 5, 2006. On February 25, 2008, the W3C released the Level 2 specification draft. Enhancements introduced in Level 2 included mechanisms for monitoring request progress, enabling cross-origin data retrieval, and managing binary data streams. By the close of 2011, the Level 2 specification details were integrated back into the primary standard document. At the conclusion of 2012, responsibility for maintenance transferred to the WHATWG, which now sustains a continuously evolving document utilizing Web IDL notation.

== Operational Use == Generally, executing a data request using XMLHttpRequest entails several discrete programming phases.

Instantiate an XMLHttpRequest instance by invoking its constructor: Invoke the open method to define the request methodology, specify the target resource address, and designate the request as either synchronous or asynchronous: For operations designated as asynchronous, establish an event handler function that will be triggered upon changes to the request's state: Commence the data transmission process by calling the send method: Process state transitions within the registered event listener. If the server returns payload data, this is typically aggregated within the responseText attribute by default. When the object completes all response processing, its status transitions to state 4, the terminal "done" status. Beyond these fundamental steps, XMLHttpRequest offers numerous optional controls governing transmission parameters and response handling. Custom header fields can be affixed to the outgoing query to guide server behavior, and data payloads can be transmitted upstream within the argument supplied to the send function call. The received data stream can be deserialized from JSON format into a ready-to-use JavaScript object, or processed incrementally as chunks arrive rather than awaiting the full response completion. Furthermore, the request can be forcibly terminated prematurely or configured with a deadline, failing if not finished within the allotted timeframe.

== Cross-Origin Transactions ==

During the nascent period of the World Wide Web's evolution, constraints were identified that permitted the unauthorized data exfiltration or manipulation through the browser's native networking capabilities, leading to security restrictions on cross-domain data retrieval.

See Also

`