MCP Service Module: Elasticsearch Vector Search Orchestrator

Reference implementation for the architecture detailed here: https://j.blaszyk.me/tech-blog/mcp-server-elasticsearch-semantic-search/

System Introduction
Deployment and Activation
Integration with Client Application (Claude)
Pre-indexing Phase: Content Harvesting
Phase 1: Validating the Harvesting Utility
Phase 2: Configuring the Data Persistence Layer
Phase 3: Schema Adaptation for Vectorization
Phase 4: Commencing Data Ingestion
Phase 5: Confirming Data Persistence Status

System Introduction

This package furnishes a Python-based MCP server environment dedicated to performing vector-based similarity queries against the corpus of Search Labs technical publications residing in Elasticsearch.

It presupposes that the source material has already been harvested and persisted into the designated index, named search-labs-posts, utilizing the Elastic Open Crawler utility.

Deployment and Activation

Populate the .env configuration file with the connection string for Elasticsearch (ES_URL) and an access token (ES_AP_KEY). Refer to the section below detailing API key generation with minimal required scope.

To launch the service locally via the MCP control panel (Inspector):

sh make dev

The operational interface for verification will be available at: http://localhost:5173

Integration with Client Application (Claude)

To register this newly exposed service endpoint with the Claude Desktop environment:

sh make install-claude-config

This action modifies the configuration file located at ~/claude_desktop_config.json. Upon the subsequent launch of the Claude application, the semantic search capability will be recognized and made available as a tool.

Pre-indexing Phase: Content Harvesting

Phase 1: Validating the Harvesting Utility

Execute a preliminary test run of the Elastic Open Crawler configuration:

sh docker run --rm \ --entrypoint /bin/bash \ -v "$(pwd)/crawler-config:/app/config" \ --network host \ docker.elastic.co/integrations/crawler:latest \ -c "bin/crawler crawl config/test-crawler.yml"

This command should successfully output the processed content from a single test document.

Phase 2: Configuring the Data Persistence Layer

Define the necessary parameters: the Elasticsearch endpoint address and the requisite API Key.

Generate an API key granting the minimum necessary permissions for data ingestion:

sh POST /_security/api_key { "name": "crawler-search-labs", "role_descriptors": { "crawler-search-labs-role": { "cluster": ["monitor"], "indices": [ { "names": ["search-labs-posts"], "privileges": ["all"] } ] } }, "metadata": { "application": "crawler" } }

Extract the encoded credential from the returned payload and assign it to the API_KEY environment variable.

Phase 3: Schema Adaptation for Vectorization

Confirm the existence of the target index (search-labs-posts). If absent, initiate its creation:

sh PUT search-labs-posts

Apply the structural modification (mapping) required to support vector embeddings/semantic representation:

sh PUT search-labs-posts/_mappings { "properties": { "body": { "type": "text", "copy_to": "semantic_body" }, "semantic_body": { "type": "semantic_text", "inference_id": ".elser-2-elasticsearch" } } }

This setup ensures that the content of the primary body field is simultaneously vectorized using Elasticsearch’s embedded ELSER model and stored in the semantic_body field.

Phase 4: Commencing Data Ingestion

Execute the full harvesting process to populate the data store:

sh docker run --rm \ --entrypoint /bin/bash \ -v "$(pwd)/crawler-config:/app/config" \ --network host \ docker.elastic.co/integrations/crawler:latest \ -c "bin/crawler crawl config/elastic-search-labs-crawler.yml"

[!NOTE] Crucial for new deployments: Ensure the ELSER inference pipeline has fully initialized within your Elasticsearch environment prior to commencing data indexing operations.

Phase 5: Confirming Data Persistence Status

Verify that data records have successfully landed in the index:

sh GET search-labs-posts/_count

The result will yield the total tally of documents. Visual confirmation can also be obtained via the Kibana interface.

Process Complete! The system is now primed to execute vector similarity queries across the Search Labs article set.

search-engine-semantic-service-mcp-agent

Author

jedrazb

Quick Info

Actions

Tags