Trino-Iceberg Data Orchestration Utility
This Python-based utility establishes connectivity with the Trino distributed SQL query engine, often utilizing the Apache Iceberg table format for robust data organization. It provides standardized mechanisms for complex data querying, exploratory analysis, and essential table lifecycle management operations. A database, fundamentally an organized collection of data, relies on a DBMS for interaction; this tool acts as an application layer interacting with Trino, a powerful relational database access system.
Author

alaturqua
Quick Info
Actions
Tags
Introduction
This utility, conforming to the Model Context Protocol (MCP), functions as a specialized server designed to interface cleanly with Trino and Apache Iceberg data stores. Databases, which are organized data collections managed by a Database Management System (DBMS), benefit from tools that automate or simplify complex interactions. This package focuses on enabling advanced querying and routine maintenance tasks for tabular data structures managed by Trino.
Use Cases
- Facilitating interactive data examination and analysis procedures using Trino's capabilities.
- Performing automated upkeep and optimization routines on tables utilizing the Iceberg format.
- Developing software applications that require programmatic access to data within Trino environments.
- Managing the execution of various SQL statements while ensuring results are properly structured.
Prerequisites
To successfully employ this tool, several components must be readily available in your environment.
- An operational instance of the Trino server, or Docker Compose setup for local testing.
- The Python interpreter, version 3.11 or a more recent release.
- Docker (this is optional, primarily useful for containerized deployment scenarios).
Installation
Installing via Smithery
Installation for Claude Desktop can be automated readily using the Smithery client utility. This method handles necessary packaging for platform integration.
npx -y @smithery/cli install @alaturqua/mcp-trino-python --client claude
Running Trino Locally
The most straightforward approach for immediate setup involves using the configuration files provided within this project. Execute the following command to launch the necessary services in detached mode.
docker-compose up -d
This action initiates the Trino server, typically accessible on the address localhost:8080. Following this, you can configure the MCP server component to connect.
Usage with VS Code
Quick integration into your Visual Studio Code workflow involves updating your settings file. You can access this by invoking Ctrl + Shift + P and searching for Preferences: Open User Settings (JSON).
Alternatively, a dedicated configuration file named .vscode/mcp.json within your project root allows for shared workspace settings. Remember that the top-level mcp key is omitted when using the workspace file approach.
The nesting under the main
mcpkey is disregarded when the settings reside in the local.vscode/mcp.jsonfile.
{
"mcp": {
"servers": {
"trino": {
"command": "docker",
"args": ["run", "--rm", "ghcr.io/alaturqua/mcp-trino-python:latest"],
"env": {
"TRINO_HOST": "${input:trino_host}",
"TRINO_PORT": "${input:trino_port}",
"TRINO_USER": "${input:trino_user}",
"TRINO_PASSWORD": "${input:trino_password}",
"TRINO_HTTP_SCHEME": "${input:trino_http_scheme}",
"TRINO_CATALOG": "${input:trino_catalog}",
"TRINO_SCHEMA": "${input:trino_schema}"
}
}
}
}
}
Usage with Claude Desktop
For users operating within the Claude Desktop environment, integrate the following configuration structure into your respective application settings area.
{
"mcpServers": {
"trino": {
"command": "python",
"args": ["./src/server.py"],
"env": {
"TRINO_HOST": "your-trino-host",
"TRINO_PORT": "8080",
"TRINO_USER": "trino"
}
}
}
}
Configuration
Environment Variables
Configuration parameters necessary for establishing a secure connection are managed via environment variables. The table below details these settings, their purpose, and any default values if the variable is omitted.
| Variable | Description | Default |
|---|---|---|
| TRINO_HOST | Trino server hostname | localhost |
| TRINO_PORT | Trino server port | 8080 |
| TRINO_USER | Trino username | trino |
| TRINO_CATALOG | Default catalog | None |
| TRINO_SCHEMA | Default schema | None |
| TRINO_HTTP_SCHEME | HTTP scheme (http/https) | http |
| TRINO_PASSWORD | Trino password | None |
Tools
This section enumerates the specific callable functions exposed by the server for data manipulation and metadata inspection.
Query and Exploration Tools
-
show_catalogs
-
Retrieves a complete listing of accessible catalogs.
-
No input parameters are necessary for this operation.
-
show_schemas
-
Lists all available schemas within a specified catalog.
-
Parameters:
catalog: The name of the catalog (string type, mandatory).
-
show_tables
-
Obtains a roster of all tables contained within a particular schema.
-
Parameters:
catalog: Identifier for the catalog (string type, required).schema: Identifier for the schema (string type, required).
-
describe_table
-
Provides comprehensive structural details and column definitions for a target table.
-
Parameters:
table: The designation of the table (string type, required).catalog: The catalog name (string, optional).schema: The schema name (string, optional).
-
execute_query
-
Executes an arbitrary SQL command and formats the resulting output.
-
Parameters:
query: The specific SQL statement intended for execution (string type, required).
-
show_catalog_tree
-
Generates a structural representation of catalogs, schemas, and tables.
- Returns data organized as a hierarchical, visually indicated tree structure.
-
No input parameters are required.
-
show_create_table
-
Displays the necessary DDL statement required to recreate the specified table structure.
-
Parameters:
table: The table identifier (string type, required).catalog: The catalog name (string, optional).schema: The schema name (string, optional).
-
show_create_view
-
Displays the DDL statement used to define a particular view.
-
Parameters:
view: The view identifier (string type, required).catalog: The catalog name (string, optional).schema: The schema name (string, optional).
-
show_stats
- Presents performance and usage statistics associated with a designated table.
- Parameters:
table: The table identifier (string type, required).catalog: The catalog name (string, optional).schema: The schema name (string, optional).
Iceberg Table Maintenance
These commands facilitate direct management of Iceberg table physical layout and history pruning.
-
optimize
-
Initiates a compaction process to consolidate smaller data files within an Iceberg table.
-
Parameters:
table: The table identifier (string type, required).catalog: The catalog name (string, optional).schema: The schema name (string, optional).
-
optimize_manifests
-
Performs optimization procedures specifically on the manifest files associated with the table.
-
Parameters:
table: The table identifier (string type, required).catalog: The catalog name (string, optional).schema: The schema name (string, optional).
-
expire_snapshots
- Removes historical snapshot records from the table metadata based on a time criteria.
- Parameters:
table: The table identifier (string type, required).retention_threshold: Defines the minimum age for removal, e.g., "7d" (string, optional).catalog: The catalog name (string, optional).schema: The schema name (string, optional).
Iceberg Metadata Inspection
These functions allow deep exploration of the metadata layers specific to Iceberg tables.
-
show_table_properties
-
Retrieves the stored configuration properties specific to the Iceberg table.
-
Parameters:
table: The table identifier (string type, required).catalog: The catalog name (string, optional).schema: The schema name (string, optional).
-
show_table_history
-
Displays the chronological ledger of changes, including lineage and ancestry tracking.
- Contains information regarding snapshot timing and structural evolution.
-
Parameters:
table: The table identifier (string type, required).catalog: The catalog name (string, optional).schema: The schema name (string, optional).
-
show_metadata_log_entries
-
Lists the sequence of metadata files that constitute the table's history log.
- Details file locations and the sequential order of metadata updates.
-
Parameters:
table: The table identifier (string type, required).catalog: The catalog name (string, optional).schema: The schema name (string, optional).
-
show_snapshots
-
Returns details about all current and previous table snapshots.
- Information includes associated operation types and linked manifest files.
-
Parameters:
table: The table identifier (string type, required).catalog: The catalog name (string, optional).schema: The schema name (string, optional).
-
show_manifests
-
Lists the manifest files pertaining to the active or specified historical snapshots.
- Includes statistics summarizing the data files referenced by those manifests.
-
Parameters:
table: The table identifier (string type, required).catalog: The catalog name (string, optional).schema: The schema name (string, optional).all_snapshots: A boolean indicating whether to examine all snapshots (optional).
-
show_partitions
-
Displays the partition statistics mapped across the table's data.
- Contains aggregated counts regarding files grouped by partition values.
-
Parameters:
table: The table identifier (string type, required).catalog: The catalog name (string, optional).schema: The schema name (string, optional).
-
show_files
-
Lists the actual data files comprising the structure of the current snapshot.
- Provides granular metadata for each file, including column-level statistics.
-
Parameters:
table: The table identifier (string type, required).catalog: The catalog name (string, optional).schema: The schema name (string, optional).
-
show_entries
-
Displays detailed entries from manifest files for the current or all tracked snapshots.
- Includes file status and comprehensive metrics at the entry level.
-
Parameters:
table: The table identifier (string type, required).catalog: The catalog name (string, optional).schema: The schema name (string, optional).all_snapshots: A boolean to include data from all snapshots (optional).
-
show_refs
-
Retrieves information concerning table references such as branches and tags.
- Details the configuration of these references and which snapshot they currently point to.
- Parameters:
table: The table identifier (string type, required).catalog: The catalog name (string, optional).schema: The schema name (string, optional).
Query History
- show_query_history
- Retrieves a record of recently executed queries processed by the system.
- Parameters:
limit: Specifies the maximum number of query records to return (number type, optional).
Related Topics
- Database Management System (DBMS): Software managing data storage and retrieval.
- Trino: A distributed SQL query engine designed for federated data access across diverse sources.
- Apache Iceberg: An open table format for huge analytic datasets, enhancing reliability over traditional file systems.
- SQL: The standard language used for interacting with relational database models.
- Data Modeling: The process of defining how data is structured and related within a system.
Extra Details
While earlier systems relied on physical storage methods like index cards for data organization, modern database systems handle vast scales, often leveraging distributed computing architectures like those Trino manages. Efficient data representation and ensuring fault tolerance are primary concerns in designing such large-scale data handling tools. Security and privacy considerations for sensitive data remain critical aspects of any robust database system implementation.
License
This software is distributed under the terms of the Apache License, version 2.0. Full license stipulations are available within the designated LICENSE file.
Conclusion
This utility streamlines complex interactions with high-performance, distributed database systems like Trino, especially when coupled with formats like Iceberg. By abstracting maintenance and metadata inspection into standardized callable functions, it supports building reliable applications that analyze and transform organizational data effectively.
