structured-data-extractor
A Model Context Protocol (MCP) utility designed to serialize unstructured textual intelligence, typically derived from generative AI outputs, into precise JSON structures defined by user-provided schemas. This facilitates robust ingestion of heterogeneous data into software systems.
Author

omer-ayhan
Quick Info
Actions
Tags
Schema-Driven Text Serialization Tool
This Model Context Protocol (MCP) server equips users with mechanisms to impose structure onto raw text content using predefined JSON blueprints.
Core Capabilities
Textual Artifact to JSON Translation
- Organize and format prose according to specified JSON scaffolding containing variable slots.
- Systematically pull salient details from model-generated narratives into strongly typed JSON representations.
- Compatibility with arbitrarily complex, deeply nested JSON output definitions.
- Sophisticated parsing logic to deduce and isolate key-value mappings from free-form text.
- Prepares raw AI response artifacts for reliable consumption by subsequent application modules.
Initial Setup
Installation Procedure
bash npm install
Server Initiation
bash npm start
For active coding sessions with automated code reloading:
bash npm run dev:watch
Operational Usage
This service exposes two primary functional endpoints:
1. Prompt Generation for Grouping (group-text-by-json)
This function accepts a JSON schema containing interpolation markers (placeholders) and generates an optimized instructional prompt for an underlying AI system. The goal is to compel the AI to output extracted entities mapped directly against those markers.
{
"template": "{ \"entity_class\": \"
The tool parses the input schema, identifies the requisite extraction keys, and constructs a directive prompt guiding the AI toward supplying data in a key-value enumeration format.
2. Finalizing Structure (text-to-json)
This utility consumes the enumerated output text (presumably generated via the first tool) and maps its contents precisely onto the structure dictated by the initial JSON schema, yielding a finalized JSON object.
{
"template": "{ \"entity_class\": \"
It parses the explicit mappings present in the input text and serializes them according to the template's organizational hierarchy.
Illustrative Execution Sequence
- Schema Definition (Blueprint Creation):
{
"product_record": {
"identifier": "
-
Prompt Generation (
group-text-by-json): -
The system discovers the variables:
sku,cost,abstract. -
A request prompt is synthesized instructing the AI to return values for these specific identifiers.
-
AI Text Generation (Receive Structured Snippet):
sku: X99-ALPHA cost: 45.50 USD abstract: Detailed overview of the product's functionality.
- Conversion to JSON (
text-to-json): - The key-value pairs are mapped to the input schema.
- Result:
{ "product_record": { "identifier": "X99-ALPHA", "unit_cost": "45.50 USD", "summary": "Detailed overview of the product's functionality." } }
Template Syntax Rules
Templates support placeholder insertion throughout any valid JSON structure:
- Markers utilize angle brackets for demarcation:
<identifier>,<field_name>, etc. - The core definition must conform to standard JSON syntax.
- Markers are functional irrespective of nesting depth.
- Capable of handling deeply intertwined hierarchical definitions.
Example schema showcasing deep embedding:
{
"system_config": {
"core_settings": {
"processor_id": "
Internal Mechanism Overview
The server executes the following sequence:
- Syntactic inspection of provided JSON structures to catalogue all placeholder variables.
- Formulation of explicit textual directives intended to solicit data corresponding to these variables from language models.
- Parsing of model-supplied text to isolate discrete key-value associations.
- Reassembly of the final JSON object, strictly adhering to the topology of the initial schema.
Development Environment
Prerequisites
- Node.js version 18 or later
- Package manager: npm or yarn
Build and Execution
bash
Dependency installation
npm install
Compilation step
npm run build
Production server start
npm start
Live coding mode with instant feedback
npm run dev:watch
Integrated Live Reloading Logic
This project incorporates a custom mechanism for development cycle acceleration:
- nodemon: Monitors source files for changes, triggering TypeScript compilation.
- browser-sync: Automatically pushes updates to connected browsers upon successful compilation.
- Concurrency: Both watch/rebuild and sync services are executed in parallel, synchronizing output streams.
The configuration resides within nodemon.json (build control) and package.json (concurrent task runner).
To activate this developer workflow:
bash npm run dev:watch
This establishes an environment where:
- TypeScript source files are automatically transpiled upon modification.
- The MCP service instance is automatically restarted with the fresh code.
- Client browsers refresh immediately to reflect the updated state.
Debugging with MCP Inspector
For interactive debugging and visualization of request/response flows, launch the server using:
bash npm run dev
This invokes the server alongside the dedicated MCP Inspector interface.
