Introduction

This utility, conforming to the Model Context Protocol (MCP), functions as a specialized server designed to interface cleanly with Trino and Apache Iceberg data stores. Databases, which are organized data collections managed by a Database Management System (DBMS), benefit from tools that automate or simplify complex interactions. This package focuses on enabling advanced querying and routine maintenance tasks for tabular data structures managed by Trino.

Use Cases

Facilitating interactive data examination and analysis procedures using Trino's capabilities.
Performing automated upkeep and optimization routines on tables utilizing the Iceberg format.
Developing software applications that require programmatic access to data within Trino environments.
Managing the execution of various SQL statements while ensuring results are properly structured.

Prerequisites

To successfully employ this tool, several components must be readily available in your environment.

An operational instance of the Trino server, or Docker Compose setup for local testing.
The Python interpreter, version 3.11 or a more recent release.
Docker (this is optional, primarily useful for containerized deployment scenarios).

Installation

Installing via Smithery

Installation for Claude Desktop can be automated readily using the Smithery client utility. This method handles necessary packaging for platform integration.

npx -y @smithery/cli install @alaturqua/mcp-trino-python --client claude

Running Trino Locally

The most straightforward approach for immediate setup involves using the configuration files provided within this project. Execute the following command to launch the necessary services in detached mode.

docker-compose up -d

This action initiates the Trino server, typically accessible on the address localhost:8080. Following this, you can configure the MCP server component to connect.

Usage with VS Code

Quick integration into your Visual Studio Code workflow involves updating your settings file. You can access this by invoking Ctrl + Shift + P and searching for Preferences: Open User Settings (JSON).

Alternatively, a dedicated configuration file named .vscode/mcp.json within your project root allows for shared workspace settings. Remember that the top-level mcp key is omitted when using the workspace file approach.

The nesting under the main mcp key is disregarded when the settings reside in the local .vscode/mcp.json file.

{
  "mcp": {
    "servers": {
      "trino": {
        "command": "docker",
        "args": ["run", "--rm", "ghcr.io/alaturqua/mcp-trino-python:latest"],
        "env": {
          "TRINO_HOST": "${input:trino_host}",
          "TRINO_PORT": "${input:trino_port}",
          "TRINO_USER": "${input:trino_user}",
          "TRINO_PASSWORD": "${input:trino_password}",
          "TRINO_HTTP_SCHEME": "${input:trino_http_scheme}",
          "TRINO_CATALOG": "${input:trino_catalog}",
          "TRINO_SCHEMA": "${input:trino_schema}"
        }
      }
    }
  }
}

Usage with Claude Desktop

For users operating within the Claude Desktop environment, integrate the following configuration structure into your respective application settings area.

{
  "mcpServers": {
    "trino": {
      "command": "python",
      "args": ["./src/server.py"],
      "env": {
        "TRINO_HOST": "your-trino-host",
        "TRINO_PORT": "8080",
        "TRINO_USER": "trino"
      }
    }
  }
}

Configuration

Environment Variables

Configuration parameters necessary for establishing a secure connection are managed via environment variables. The table below details these settings, their purpose, and any default values if the variable is omitted.

Variable	Description	Default
TRINO_HOST	Trino server hostname	localhost
TRINO_PORT	Trino server port	8080
TRINO_USER	Trino username	trino
TRINO_CATALOG	Default catalog	None
TRINO_SCHEMA	Default schema	None
TRINO_HTTP_SCHEME	HTTP scheme (http/https)	http
TRINO_PASSWORD	Trino password	None

Tools

This section enumerates the specific callable functions exposed by the server for data manipulation and metadata inspection.

Query and Exploration Tools

show_catalogs
Retrieves a complete listing of accessible catalogs.
No input parameters are necessary for this operation.
show_schemas
Lists all available schemas within a specified catalog.
Parameters:
- catalog: The name of the catalog (string type, mandatory).
show_tables
Obtains a roster of all tables contained within a particular schema.
Parameters:
- catalog: Identifier for the catalog (string type, required).
- schema: Identifier for the schema (string type, required).
describe_table
Provides comprehensive structural details and column definitions for a target table.
Parameters:
- table: The designation of the table (string type, required).
- catalog: The catalog name (string, optional).
- schema: The schema name (string, optional).
execute_query
Executes an arbitrary SQL command and formats the resulting output.
Parameters:
- query: The specific SQL statement intended for execution (string type, required).
show_catalog_tree
Generates a structural representation of catalogs, schemas, and tables.
Returns data organized as a hierarchical, visually indicated tree structure.
No input parameters are required.
show_create_table
Displays the necessary DDL statement required to recreate the specified table structure.
Parameters:
- table: The table identifier (string type, required).
- catalog: The catalog name (string, optional).
- schema: The schema name (string, optional).
show_create_view
Displays the DDL statement used to define a particular view.
Parameters:
- view: The view identifier (string type, required).
- catalog: The catalog name (string, optional).
- schema: The schema name (string, optional).
show_stats
Presents performance and usage statistics associated with a designated table.
Parameters:
- table: The table identifier (string type, required).
- catalog: The catalog name (string, optional).
- schema: The schema name (string, optional).

Iceberg Table Maintenance

These commands facilitate direct management of Iceberg table physical layout and history pruning.

optimize
Initiates a compaction process to consolidate smaller data files within an Iceberg table.
Parameters:
- table: The table identifier (string type, required).
- catalog: The catalog name (string, optional).
- schema: The schema name (string, optional).
optimize_manifests
Performs optimization procedures specifically on the manifest files associated with the table.
Parameters:
- table: The table identifier (string type, required).
- catalog: The catalog name (string, optional).
- schema: The schema name (string, optional).
expire_snapshots
Removes historical snapshot records from the table metadata based on a time criteria.
Parameters:
- table: The table identifier (string type, required).
- retention_threshold: Defines the minimum age for removal, e.g., "7d" (string, optional).
- catalog: The catalog name (string, optional).
- schema: The schema name (string, optional).

Iceberg Metadata Inspection

These functions allow deep exploration of the metadata layers specific to Iceberg tables.

show_table_properties
Retrieves the stored configuration properties specific to the Iceberg table.
Parameters:
- table: The table identifier (string type, required).
- catalog: The catalog name (string, optional).
- schema: The schema name (string, optional).
show_table_history
Displays the chronological ledger of changes, including lineage and ancestry tracking.
Contains information regarding snapshot timing and structural evolution.
Parameters:
- table: The table identifier (string type, required).
- catalog: The catalog name (string, optional).
- schema: The schema name (string, optional).
show_metadata_log_entries
Lists the sequence of metadata files that constitute the table's history log.
Details file locations and the sequential order of metadata updates.
Parameters:
- table: The table identifier (string type, required).
- catalog: The catalog name (string, optional).
- schema: The schema name (string, optional).
show_snapshots
Returns details about all current and previous table snapshots.
Information includes associated operation types and linked manifest files.
Parameters:
- table: The table identifier (string type, required).
- catalog: The catalog name (string, optional).
- schema: The schema name (string, optional).
show_manifests
Lists the manifest files pertaining to the active or specified historical snapshots.
Includes statistics summarizing the data files referenced by those manifests.
Parameters:
- table: The table identifier (string type, required).
- catalog: The catalog name (string, optional).
- schema: The schema name (string, optional).
- all_snapshots: A boolean indicating whether to examine all snapshots (optional).
show_partitions
Displays the partition statistics mapped across the table's data.
Contains aggregated counts regarding files grouped by partition values.
Parameters:
- table: The table identifier (string type, required).
- catalog: The catalog name (string, optional).
- schema: The schema name (string, optional).
show_files
Lists the actual data files comprising the structure of the current snapshot.
Provides granular metadata for each file, including column-level statistics.
Parameters:
- table: The table identifier (string type, required).
- catalog: The catalog name (string, optional).
- schema: The schema name (string, optional).
show_entries
Displays detailed entries from manifest files for the current or all tracked snapshots.
Includes file status and comprehensive metrics at the entry level.
Parameters:
- table: The table identifier (string type, required).
- catalog: The catalog name (string, optional).
- schema: The schema name (string, optional).
- all_snapshots: A boolean to include data from all snapshots (optional).
show_refs
Retrieves information concerning table references such as branches and tags.
Details the configuration of these references and which snapshot they currently point to.
Parameters:
- table: The table identifier (string type, required).
- catalog: The catalog name (string, optional).
- schema: The schema name (string, optional).

Query History

show_query_history
Retrieves a record of recently executed queries processed by the system.
Parameters:
- limit: Specifies the maximum number of query records to return (number type, optional).

Database Management System (DBMS): Software managing data storage and retrieval.
Trino: A distributed SQL query engine designed for federated data access across diverse sources.
Apache Iceberg: An open table format for huge analytic datasets, enhancing reliability over traditional file systems.
SQL: The standard language used for interacting with relational database models.
Data Modeling: The process of defining how data is structured and related within a system.

Extra Details

While earlier systems relied on physical storage methods like index cards for data organization, modern database systems handle vast scales, often leveraging distributed computing architectures like those Trino manages. Efficient data representation and ensuring fault tolerance are primary concerns in designing such large-scale data handling tools. Security and privacy considerations for sensitive data remain critical aspects of any robust database system implementation.

License

This software is distributed under the terms of the Apache License, version 2.0. Full license stipulations are available within the designated LICENSE file.

Conclusion

This utility streamlines complex interactions with high-performance, distributed database systems like Trino, especially when coupled with formats like Iceberg. By abstracting maintenance and metadata inspection into standardized callable functions, it supports building reliable applications that analyze and transform organizational data effectively.

Trino-Iceberg Data Orchestration Utility

Author

alaturqua

Quick Info

Actions

Tags

Introduction

Use Cases

Prerequisites

Installation

Installing via Smithery

Running Trino Locally

Usage with VS Code

Usage with Claude Desktop

Configuration

Environment Variables

Tools

Query and Exploration Tools

Iceberg Table Maintenance

Iceberg Metadata Inspection

Query History

Extra Details

License

Conclusion

See Also

Trino-Iceberg Data Orchestration Utility

Author

alaturqua

Quick Info

Actions

Tags

Introduction

Use Cases

Prerequisites

Installation

Installing via Smithery

Running Trino Locally

Usage with VS Code

Usage with Claude Desktop

Configuration

Environment Variables

Tools

Query and Exploration Tools

Iceberg Table Maintenance

Iceberg Metadata Inspection

Query History

Related Topics

Extra Details

License

Conclusion

See Also