search-engine-semantic-service-mcp-agent
A utility designed to enable advanced conceptual searching over content stored within an Elasticsearch cluster, specifically targeting technical articles indexed from the Search Labs repository. It streamlines the process of content ingestion and retrieval through vector-based similarity matching.
Author

jedrazb
Quick Info
Actions
Tags
MCP Service Module: Elasticsearch Vector Search Orchestrator
Reference implementation for the architecture detailed here: https://j.blaszyk.me/tech-blog/mcp-server-elasticsearch-semantic-search/
Navigation Map
- System Introduction
- Deployment and Activation
- Integration with Client Application (Claude)
- Pre-indexing Phase: Content Harvesting
- Phase 1: Validating the Harvesting Utility
- Phase 2: Configuring the Data Persistence Layer
- Phase 3: Schema Adaptation for Vectorization
- Phase 4: Commencing Data Ingestion
- Phase 5: Confirming Data Persistence Status
System Introduction
This package furnishes a Python-based MCP server environment dedicated to performing vector-based similarity queries against the corpus of Search Labs technical publications residing in Elasticsearch.
It presupposes that the source material has already been harvested and persisted into the designated index, named search-labs-posts, utilizing the Elastic Open Crawler utility.
Deployment and Activation
Populate the .env configuration file with the connection string for Elasticsearch (ES_URL) and an access token (ES_AP_KEY). Refer to the section below detailing API key generation with minimal required scope.
To launch the service locally via the MCP control panel (Inspector):
sh make dev
The operational interface for verification will be available at: http://localhost:5173
Integration with Client Application (Claude)
To register this newly exposed service endpoint with the Claude Desktop environment:
sh make install-claude-config
This action modifies the configuration file located at ~/claude_desktop_config.json. Upon the subsequent launch of the Claude application, the semantic search capability will be recognized and made available as a tool.
Pre-indexing Phase: Content Harvesting
Phase 1: Validating the Harvesting Utility
Execute a preliminary test run of the Elastic Open Crawler configuration:
sh docker run --rm \ --entrypoint /bin/bash \ -v "$(pwd)/crawler-config:/app/config" \ --network host \ docker.elastic.co/integrations/crawler:latest \ -c "bin/crawler crawl config/test-crawler.yml"
This command should successfully output the processed content from a single test document.
Phase 2: Configuring the Data Persistence Layer
Define the necessary parameters: the Elasticsearch endpoint address and the requisite API Key.
Generate an API key granting the minimum necessary permissions for data ingestion:
sh POST /_security/api_key { "name": "crawler-search-labs", "role_descriptors": { "crawler-search-labs-role": { "cluster": ["monitor"], "indices": [ { "names": ["search-labs-posts"], "privileges": ["all"] } ] } }, "metadata": { "application": "crawler" } }
Extract the encoded credential from the returned payload and assign it to the API_KEY environment variable.
Phase 3: Schema Adaptation for Vectorization
Confirm the existence of the target index (search-labs-posts). If absent, initiate its creation:
sh PUT search-labs-posts
Apply the structural modification (mapping) required to support vector embeddings/semantic representation:
sh PUT search-labs-posts/_mappings { "properties": { "body": { "type": "text", "copy_to": "semantic_body" }, "semantic_body": { "type": "semantic_text", "inference_id": ".elser-2-elasticsearch" } } }
This setup ensures that the content of the primary body field is simultaneously vectorized using Elasticsearch’s embedded ELSER model and stored in the semantic_body field.
Phase 4: Commencing Data Ingestion
Execute the full harvesting process to populate the data store:
sh docker run --rm \ --entrypoint /bin/bash \ -v "$(pwd)/crawler-config:/app/config" \ --network host \ docker.elastic.co/integrations/crawler:latest \ -c "bin/crawler crawl config/elastic-search-labs-crawler.yml"
[!NOTE] Crucial for new deployments: Ensure the ELSER inference pipeline has fully initialized within your Elasticsearch environment prior to commencing data indexing operations.
Phase 5: Confirming Data Persistence Status
Verify that data records have successfully landed in the index:
sh GET search-labs-posts/_count
The result will yield the total tally of documents. Visual confirmation can also be obtained via the Kibana interface.
Process Complete! The system is now primed to execute vector similarity queries across the Search Labs article set.
