cryo-mcp-data-extractor
A Model Completion Protocol (MCP) endpoint providing robust access and sophisticated querying mechanisms for Ethereum ledger entries via the underlying Cryo extraction engine. Facilitates real-time operational intelligence extraction using standard MCP clients.
Author

z80dev
Quick Info
Actions
Tags
Cryo Ethereum Ledger Data Access Service 🧊 (MCP)
This service acts as an MCP gateway for the high-performance Cryo data retrieval utility. By exposing Cryo's capabilities through a standardized MCP server interface, querying complex blockchain states becomes straightforward for any compatible client.
LLM Workflow: Executing Structured Query Language (SQL) Operations
To leverage SQL capabilities against the extracted blockchain artifacts, adhere to this three-phase procedure:
-
Artifact Retrieval using
query_dataset: python retrieval_result = query_dataset( dataset="blocks", # Specify ledger component: 'transactions', 'logs', etc. blocks="15000000:15001000", # Define block span or use blocks_from_latest=100 output_format="parquet" # Essential: Parquet format required for efficient SQL processing ) artifact_paths = retrieval_result.get("files", []) # Secure the list of generated file references -
Schema Inspection via
get_sql_table_schema: python # Inspect the structure of the retrieved data files data_schema = get_sql_table_schema(artifact_paths[0]) # Review available fields, types, and sample data points -
SQL Execution using
query_sql: python # Method A: Rely on DuckDB's automatic file association via table name sql_output = query_sql( query="SELECT block_number, timestamp, gas_used FROM blocks", files=artifact_paths # Supply all relevant file locations )
# Method B: Explicitly invoke Parquet reading within the query string sql_output = query_sql( query=f"SELECT block_number, timestamp, gas_used FROM read_parquet('{artifact_paths[0]}')", files=artifact_paths )
Alternatively, utilize the monolithic function query_blockchain_sql which bundles extraction and querying:
python
Integrated approach example
integrated_result = query_blockchain_sql( sql_query="SELECT * FROM transactions WHERE value > 0", dataset="transactions", blocks_from_latest=500 )
Consult examples/sql_workflow_example.py for a comprehensive demonstration.
Core Capabilities
- Complete Cryo Coverage: Access to the full spectrum of Cryo's extracted ledger components.
- MCP Native: Full compatibility across the Model Completion Protocol ecosystem.
- Granular Control: Supports all established Cryo input parameters for filtering and range definition.
- Block Scope Definition: Utilize absolute numbers, latest block references, or relative offsets.
- Entity Scoping: Precise data isolation based on contract identifiers.
- Immediacy: Direct retrieval pointers for the most recently finalized block.
- Format Versatility: Data output supported in JSON, CSV, and optimized Parquet.
- Metadata Provision: Retrieval of detailed component schemas and representative samples.
- Direct SQL: Ability to execute arbitrary SQL commands against materialized blockchain records.
Setup Prerequisites (Optional)
If executing locally via uvx, this step is bypassed.
bash
Installation via UV package manager (recommended)
uv tool install cryo-mcp
System Dependencies
- Python Interpreter version 3.8 or newer.
- The
uvpackage manager. - A functional installation of the Cryo backend system.
- Credentials for an accessible Ethereum RPC endpoint.
- DuckDB runtime environment (mandatory for SQL operations).
Initial Operation (Cloud Context)
- Initiate interactive setup via
claude mcp add. - Specify
uvxas the execution binary. - Input arguments:
cryo-mcp --rpc-url <ETH_RPC_URL> [--data-dir <DATA_DIR>] - Configuration can alternatively be supplied via environment variables (
ETH_RPC_URL,CRYO_DATA_DIR).
This establishes a persistent connection profile for claude to interact with your configured Cryo instance.
Exposed MCP Interfaces
Cryo MCP makes the following functions available:
list_datasets()
Enumerates all discoverable data sets managed by Cryo.
Example: python client.list_datasets()
query_dataset()
Initiates the extraction of a specified ledger component based on constraints.
Parameters:
- dataset (str): The target component name ('blocks', 'transactions', 'logs').
- blocks (str, optional): String defining the block interval (e.g., '1000:1010').
- start_block (int, optional): Lower bound block index (alternative to blocks).
- end_block (int, optional): Upper bound block index (alternative to blocks).
- use_latest (bool, optional): If true, targets the current tip of the chain.
- blocks_from_latest (int, optional): Count of recent blocks to process.
- contract (str, optional): Filter results strictly to events or data associated with this address.
- output_format (str, optional): Serialization type ('json', 'csv', 'parquet').
- include_columns (list, optional): Explicitly request certain fields.
- exclude_columns (list, optional): Explicitly omit default fields.
Example: python
Retrieve transactions spanning two specific block milestones
client.query_dataset('transactions', blocks='15M:15.01M')
Fetch logs related to a specific address over the last 100 finalized blocks
client.query_dataset('logs', blocks_from_latest=100, contract='0x1234...')
Obtain metadata for the very latest block
client.query_dataset('blocks', use_latest=True)
lookup_dataset()
Retrieves metadata, including schema definition and sample data, for a component.
Parameters:
- name (str): Identifier of the data component to investigate.
- sample_start_block (int, optional): Block index to begin sample capture.
- sample_end_block (int, optional): Block index to conclude sample capture.
- use_latest_sample (bool, optional): Base sample generation on the latest block.
- sample_blocks_from_latest (int, optional): Number of recent blocks for sample extraction.
Example: python client.lookup_dataset('logs')
get_latest_ethereum_block()
Returns the current finalized block header details.
Example: python client.get_latest_ethereum_block()
SQL Querying Interfaces
Cryo MCP furnishes specialized interfaces for executing analytical queries over retrieved data:
query_sql()
Executes a user-defined SQL statement against locally provisioned data artifacts.
Parameters:
- query (str): The complete SQL expression to be evaluated.
- files (list, optional): List of file paths containing the data sources. If omitted, the tool scans the configured data directory.
- include_schema (bool, optional): Flag to embed schema details within the output response.
Example: python
Query executed against all known parquet segments
client.query_sql("SELECT * FROM read_parquet('/data/segment_001.parquet') LIMIT 10")
Targeted execution against a specific artifact file
client.query_sql( "SELECT block_number FROM read_parquet('/data/blocks.parquet') WHERE difficulty > 1000000000000000", files=['/data/blocks.parquet'] )
query_blockchain_sql()
Streamlines the process: it handles data acquisition based on parameters, then executes the supplied SQL query.
Parameters:
- sql_query (str): The analytical SQL command.
- dataset (str, optional): The source ledger component ('blocks', 'transactions', etc.).
- blocks (str, optional): Block range specification string.
- start_block (int, optional): Starting block index.
- end_block (int, optional): Ending block index.
- use_latest (bool, optional): Target the latest block.
- blocks_from_latest (int, optional): Number of trailing blocks to process.
- contract (str, optional): Filter data extraction by contract address.
- force_refresh (bool, optional): Bypass cache checks and re-download data.
- include_schema (bool, optional): Request schema inclusion in the final output.
Example: python
Automatically pulls the last 100 blocks of transaction data, then runs the analysis
client.query_blockchain_sql( sql_query="SELECT to_address, SUM(gas_used) FROM transactions GROUP BY to_address ORDER BY 2 DESC LIMIT 5", dataset="transactions", blocks_from_latest=100 )
list_available_sql_tables()
Reports all recognized entities ready for direct SQL access within the current data store.
Example: python client.list_available_sql_tables()
get_sql_table_schema()
Retrieves the structural blueprint for a specified materialized data file.
Parameters:
- file_path (str): Absolute or relative path to the target parquet file.
Example: python client.get_sql_table_schema("/persisted/logs_007.parquet")
get_sql_examples()
Provides illustrative SQL snippets tailored for common blockchain data analysis patterns.
Example: python client.get_sql_examples()
Server Configuration Parameters
When launching the Cryo MCP node, these command-line arguments modify operational settings:
--rpc-url URL: Specifies the Ethereum network access point (overridesETH_RPC_URLenv var).--data-dir PATH: Defines the persistence location for downloaded ledger artifacts (overridesCRYO_DATA_DIR, defaults to a local hidden directory).
Environment Variable Mapping
ETH_RPC_URL: Default endpoint for RPC communication, used if--rpc-urlis absent.CRYO_DATA_DIR: Default repository for data caching, used if--data-diris absent.
Advanced Operation Scenarios
Decomposed SQL Analysis on Ledger Data
This method decouples data acquisition from computation, offering granular control:
python
Phase 1: Acquisition and serialization to Parquet
download_manifest = client.query_dataset( dataset="transactions", blocks_from_latest=1000, output_format="parquet" )
Phase 2: SQL evaluation using returned file references
reference_list = download_manifest.get("files", []) client.query_sql( query=f""" SELECT to_address AS recipient_id, COUNT(*) AS transaction_count, SUM(gas_used) AS accumulated_gas, AVG(gas_used) AS mean_gas FROM read_parquet('{reference_list[0]}') WHERE to_address IS NOT NULL GROUP BY recipient_id ORDER BY accumulated_gas DESC LIMIT 20 """, files=reference_list )
Unified SQL Execution Pathway
For rapid analysis, use the combined tool to manage internal data fetching:
python
Identify high-throughput contract interactions
client.query_blockchain_sql( sql_query=""" SELECT to_address, COUNT(*) as interaction_volume, SUM(value) as total_value_transferred FROM read_parquet('/data/txns.parquet') WHERE to_address IS NOT NULL AND value > 0 GROUP BY to_address ORDER BY interaction_volume DESC LIMIT 20 """, dataset="transactions", blocks_from_latest=1000 )
Locate blocks that processed the maximum volume of transactions
client.query_blockchain_sql( sql_query=""" SELECT block_number, COUNT(*) as contained_tx_count FROM read_parquet('/data/txns.parquet') GROUP BY block_number ORDER BY contained_tx_count DESC LIMIT 10 """, dataset="transactions", blocks="15M:16M" )
Crucial Instruction: For optimal performance in SQL evaluation via DuckDB, ensure output_format="parquet" is specified during the data retrieval phase. When utilizing query_blockchain_sql, reference the persisted files within your query string explicitly using the read_parquet() function.
Specifying Block Ranges
This service fully recognizes Cryo's block addressing syntax:
python
Use explicit numerical indices
client.query_dataset('transactions', blocks='15000000:15001000')
Utilize 'M' (Million) notation for large spans
client.query_dataset('logs', blocks='15M:15.01M')
Fetch data relative to the current chain tip
client.query_dataset('blocks', blocks_from_latest=100)
Contract-Specific Data Isolation
Filter output streams to data relevant only to a specific contract address:
python
Extract event logs for a known stablecoin contract
client.query_dataset('logs', blocks='16M:16.1M', contract='0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48')
Field Projection Control
Minimize data transfer overhead by requesting only necessary data fields:
python
Secure only the block ID and timestamp fields for a range
client.query_dataset('blocks', blocks='16M:16.1M', include_columns=['number', 'timestamp'])
Development Repository Structure
cryo-mcp/ ├── cryo_mcp/ # Core Python modules │ ├── init.py # Package initialization module │ ├── server.py # Primary MCP server logic │ ├── sql.py # Module handling DuckDB interactions ├── tests/ # Unit and integration test suite │ ├── test_*.py # Individual test scripts ├── pyproject.toml # Project metadata and dependency definitions ├── README.md # Current documentation file
Executing Validation Suite
Run comprehensive tests using:
uv run pytest
Licensing
This utility is released under the MIT License.
Acknowledgment
- Service foundation built upon the exceptional Cryo data engine from Paradigm.
- Adheres to the standardized MCP protocol for interface definitions.
