google-parallel-query-engine
Facilitates rapid, concurrent querying of the Google search engine utilizing multiple specified search terms, incorporating automated defense against verification checks and outputting organized results in JSON structure for downstream consumption.
Author

jae-jae
Quick Info
Actions
Tags
Google Parallel Query Engine (GPQE)
An advanced Master Control Program (MCP) server dedicated to high-throughput Google lookups, allowing simultaneous execution across numerous distinct search phrases.
This utility is derived from the original google-search project.
🌟 Highly Recommended Companion: OllaMan - For robust management of Ollama AI models.
Key Capabilities
- Concurrency: Enables simultaneous execution of searches across a range of input keywords against Google, significantly boosting acquisition speed.
- Browser Efficiency: Leverages a single browser instance to manage numerous concurrent tabs for streamlined parallel fetching.
- Challenge Mitigation: Smartly identifies and intercepts security prompts (like CAPTCHAs), switching to visible browser mode only when mandatory user interaction for validation is required.
- User Emulation: Implements patterns mimicking genuine human browsing activities to minimize rate-limiting or blocking by search indexers.
- Standardized Output: Delivers the gathered search data in a machine-readable JSON format, simplifying subsequent analytical pipelines.
- Adaptable Settings: Allows fine-tuning of operational parameters, including result counts per query, data retrieval timeouts, and search locale preferences.
Initial Deployment
Execute immediately via npx:
npx -y g-search-mcp
For the first use, ensure the necessary browser automation binaries are installed in your terminal:
npx playwright install chromium
Diagnostic Mode
Invoke with the --debug flag to operate in a visible window mode:
npx -y g-search-mcp --debug
Configuring the MCP Interface
Integrate this server within your Claude Desktop configuration:
MacOS Path: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows Path: %APPDATA%/Claude/claude_desktop_config.json
Configuration Snippet:
{
"mcpServers": {
"g-search": {
"command": "npx",
"args": ["-y", "g-search-mcp"]
}
}
}
Available Functionality
search- Initiates Google lookups based on an array of input strings, returning structured results.- Utilizes the Playwright browser environment for execution.
- Supports the subsequent configuration arguments:
queries: Mandatory array defining the search phrases to process.limit: Maximum records retrieved per query; defaults to 10.timeout: Maximum allowable time (in milliseconds) for page loading; defaults to 60000 (1 minute).noSaveState: Boolean flag to prevent saving browser session data; defaults to false.locale: Specifies the geographical/language setting for search results; defaults to en-US.debug: Overrides command-line settings to force display of the browser interface.
Usage Prompt Example:
Utilize the search utility to investigate "machine learning" and "artificial intelligence" on Google
Expected Output Structure:
{
"searches": [
{
"query": "machine learning",
"results": [
{
"title": "What is Machine Learning? | IBM",
"link": "https://www.ibm.com/topics/machine-learning",
"snippet": "Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy."
},
...
]
},
{
"query": "artificial intelligence",
"results": [
{
"title": "What is Artificial Intelligence (AI)? | IBM",
"link": "https://www.ibm.com/topics/artificial-intelligence",
"snippet": "Artificial intelligence leverages computers and machines to mimic the problem-solving and decision-making capabilities of the human mind."
},
...
]
}
]
}
Operational Guidance
Addressing Specific Site Complexities
Modifying Query Parameters
- Result Volume Control: To obtain a larger set of findings:
Request the top 20 search results for every input term.
This action configures the limit parameter to 20.
- Timeout Extension: For scenarios involving slow network response:
Extend the page loading timeout duration to 120 seconds.This sets thetimeoutparameter to 120000 milliseconds.
Regional Search Configuration
- Geographic Context Shift: To target results from a specific area:
Execute searches using the German locale setting (de-DE).This sets thelocale: "de-DE"parameter.
Troubleshooting and Visibility
Activating Debug Mode
- On-Demand Visualization: To make the browser window appear for a particular operational sequence:
Activate visual rendering mode for this specific search execution.This forces thedebug: truesetting, regardless of the initial server launch configuration.
Prerequisites for Installation
- Runtime environment: Node.js version 18 or newer
- Package manager: NPM or Yarn
Local Source Compilation
- Obtain the source code repository:
git clone https://github.com/jae-jae/g-search-mcp.git
cd g-search-mcp
- Install required libraries:
npm install
- Provision the necessary browser engine:
npm run install-browser
- Compile the executable server assets:
npm run build
Development Flow
Continuous Rebuild (Development State)
npm run watch
Debugging with MCP Inspector
npm run inspector
Associated Tools
- fetcher-mcp: An efficient MCP service for retrieving raw web page data using Playwright's capabilities. It features intelligent content extraction logic, parallel tasking, resource optimization, making it a prime utility for automated web data acquisition.
Legal Status
Distributed under the terms of the MIT License
WIKIPEDIA: A headless browser operates without a graphical interface, functioning instead through programmatic control. These environments are invaluable for automated tasks like testing, as they faithfully process HTML, CSS, and JavaScript just like a standard browser, but are managed via command line or network interface. Native headless support has been incorporated into major browsers (Chrome/Firefox) since specific versions, rendering older external emulation tools less necessary. The primary applications for headless browsing include rigorous web application testing, generating page snapshots, executing JS library checks, and automating complex page interactions.
== Core Applications == The principal uses for environments lacking a GUI include:
Verification workflows for modern web platforms (QA testing) Automated rendering of full-page screenshots. Execution environments for front-end JavaScript frameworks. Programmatic manipulation of web page elements.
=== Secondary Utilities === Headless utilities are also frequently employed in large-scale data harvesting (web scraping). Google has previously noted their usefulness for indexing sites reliant on dynamic content (Ajax). Conversely, misuse exists, such as generating artificial traffic or automating unauthorized interactions (credential stuffing). However, contemporary traffic analysis does not show a strong correlation between malicious activity and the use of headless agents versus standard browser agents.
== Implementation Landscape == Given that current flagship browsers offer built-in headless APIs, several unified automation frameworks have emerged to interface with them:
Selenium WebDriver – Follows the W3C WebDriver specification. Playwright – A comprehensive library supporting Chromium, Firefox, and WebKit automation from Node.js. Puppeteer – Focused primarily on automating Chromium/Chrome instances.
=== Testing Framework Integration === Many testing suites incorporate headless capabilities into their core setup:
Capybara frequently employs Headless Chrome or WebKit to simulate user paths during protocol validation. Jasmine typically defaults to Selenium but can be configured to use headless WebKit or Chrome for its browser tests. Cypress – A dedicated framework for front-end testing. QF-Test – A tool supporting GUI-based automated testing, often utilizing a headless browser instance.
=== Non-Browser Alternatives ===
An alternative strategy involves using libraries that emulate browser APIs within a runtime environment. For instance, Deno natively includes browser APIs. In the Node.js ecosystem, jsdom provides the most extensive emulation, covering HTML parsing, cookie management, XHR, and basic JavaScript execution. While these libraries are fast, they generally lack full DOM rendering capabilities and associated event handling compared to true headless browsers.
