logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

Trino-Iceberg Data Orchestration Utility

This Python-based utility establishes connectivity with the Trino distributed SQL query engine, often utilizing the Apache Iceberg table format for robust data organization. It provides standardized mechanisms for complex data querying, exploratory analysis, and essential table lifecycle management operations. A database, fundamentally an organized collection of data, relies on a DBMS for interaction; this tool acts as an application layer interacting with Trino, a powerful relational database access system.

Author

Trino-Iceberg Data Orchestration Utility logo

alaturqua

Apache License 2.0

Quick Info

GitHub GitHub Stars 19
NPM Weekly Downloads 0
Tools 1
Last Updated 2026-02-19

Tags

databasesdatabasetrinosecure databasedatabases securedatabase access

Introduction

This utility, conforming to the Model Context Protocol (MCP), functions as a specialized server designed to interface cleanly with Trino and Apache Iceberg data stores. Databases, which are organized data collections managed by a Database Management System (DBMS), benefit from tools that automate or simplify complex interactions. This package focuses on enabling advanced querying and routine maintenance tasks for tabular data structures managed by Trino.

smithery badge Python 3.11+ VS Code Docker License

Use Cases

  • Facilitating interactive data examination and analysis procedures using Trino's capabilities.
  • Performing automated upkeep and optimization routines on tables utilizing the Iceberg format.
  • Developing software applications that require programmatic access to data within Trino environments.
  • Managing the execution of various SQL statements while ensuring results are properly structured.

Prerequisites

To successfully employ this tool, several components must be readily available in your environment.

  1. An operational instance of the Trino server, or Docker Compose setup for local testing.
  2. The Python interpreter, version 3.11 or a more recent release.
  3. Docker (this is optional, primarily useful for containerized deployment scenarios).

Installation

Installing via Smithery

Installation for Claude Desktop can be automated readily using the Smithery client utility. This method handles necessary packaging for platform integration.

npx -y @smithery/cli install @alaturqua/mcp-trino-python --client claude

Running Trino Locally

The most straightforward approach for immediate setup involves using the configuration files provided within this project. Execute the following command to launch the necessary services in detached mode.

docker-compose up -d

This action initiates the Trino server, typically accessible on the address localhost:8080. Following this, you can configure the MCP server component to connect.

Usage with VS Code

Quick integration into your Visual Studio Code workflow involves updating your settings file. You can access this by invoking Ctrl + Shift + P and searching for Preferences: Open User Settings (JSON).

Alternatively, a dedicated configuration file named .vscode/mcp.json within your project root allows for shared workspace settings. Remember that the top-level mcp key is omitted when using the workspace file approach.

The nesting under the main mcp key is disregarded when the settings reside in the local .vscode/mcp.json file.

{
  "mcp": {
    "servers": {
      "trino": {
        "command": "docker",
        "args": ["run", "--rm", "ghcr.io/alaturqua/mcp-trino-python:latest"],
        "env": {
          "TRINO_HOST": "${input:trino_host}",
          "TRINO_PORT": "${input:trino_port}",
          "TRINO_USER": "${input:trino_user}",
          "TRINO_PASSWORD": "${input:trino_password}",
          "TRINO_HTTP_SCHEME": "${input:trino_http_scheme}",
          "TRINO_CATALOG": "${input:trino_catalog}",
          "TRINO_SCHEMA": "${input:trino_schema}"
        }
      }
    }
  }
}

Usage with Claude Desktop

For users operating within the Claude Desktop environment, integrate the following configuration structure into your respective application settings area.

{
  "mcpServers": {
    "trino": {
      "command": "python",
      "args": ["./src/server.py"],
      "env": {
        "TRINO_HOST": "your-trino-host",
        "TRINO_PORT": "8080",
        "TRINO_USER": "trino"
      }
    }
  }
}

Configuration

Environment Variables

Configuration parameters necessary for establishing a secure connection are managed via environment variables. The table below details these settings, their purpose, and any default values if the variable is omitted.

Variable Description Default
TRINO_HOST Trino server hostname localhost
TRINO_PORT Trino server port 8080
TRINO_USER Trino username trino
TRINO_CATALOG Default catalog None
TRINO_SCHEMA Default schema None
TRINO_HTTP_SCHEME HTTP scheme (http/https) http
TRINO_PASSWORD Trino password None

Tools

This section enumerates the specific callable functions exposed by the server for data manipulation and metadata inspection.

Query and Exploration Tools

  • show_catalogs

  • Retrieves a complete listing of accessible catalogs.

  • No input parameters are necessary for this operation.

  • show_schemas

  • Lists all available schemas within a specified catalog.

  • Parameters:

    • catalog: The name of the catalog (string type, mandatory).
  • show_tables

  • Obtains a roster of all tables contained within a particular schema.

  • Parameters:

    • catalog: Identifier for the catalog (string type, required).
    • schema: Identifier for the schema (string type, required).
  • describe_table

  • Provides comprehensive structural details and column definitions for a target table.

  • Parameters:

    • table: The designation of the table (string type, required).
    • catalog: The catalog name (string, optional).
    • schema: The schema name (string, optional).
  • execute_query

  • Executes an arbitrary SQL command and formats the resulting output.

  • Parameters:

    • query: The specific SQL statement intended for execution (string type, required).
  • show_catalog_tree

  • Generates a structural representation of catalogs, schemas, and tables.

  • Returns data organized as a hierarchical, visually indicated tree structure.
  • No input parameters are required.

  • show_create_table

  • Displays the necessary DDL statement required to recreate the specified table structure.

  • Parameters:

    • table: The table identifier (string type, required).
    • catalog: The catalog name (string, optional).
    • schema: The schema name (string, optional).
  • show_create_view

  • Displays the DDL statement used to define a particular view.

  • Parameters:

    • view: The view identifier (string type, required).
    • catalog: The catalog name (string, optional).
    • schema: The schema name (string, optional).
  • show_stats

  • Presents performance and usage statistics associated with a designated table.
  • Parameters:
    • table: The table identifier (string type, required).
    • catalog: The catalog name (string, optional).
    • schema: The schema name (string, optional).

Iceberg Table Maintenance

These commands facilitate direct management of Iceberg table physical layout and history pruning.

  • optimize

  • Initiates a compaction process to consolidate smaller data files within an Iceberg table.

  • Parameters:

    • table: The table identifier (string type, required).
    • catalog: The catalog name (string, optional).
    • schema: The schema name (string, optional).
  • optimize_manifests

  • Performs optimization procedures specifically on the manifest files associated with the table.

  • Parameters:

    • table: The table identifier (string type, required).
    • catalog: The catalog name (string, optional).
    • schema: The schema name (string, optional).
  • expire_snapshots

  • Removes historical snapshot records from the table metadata based on a time criteria.
  • Parameters:
    • table: The table identifier (string type, required).
    • retention_threshold: Defines the minimum age for removal, e.g., "7d" (string, optional).
    • catalog: The catalog name (string, optional).
    • schema: The schema name (string, optional).

Iceberg Metadata Inspection

These functions allow deep exploration of the metadata layers specific to Iceberg tables.

  • show_table_properties

  • Retrieves the stored configuration properties specific to the Iceberg table.

  • Parameters:

    • table: The table identifier (string type, required).
    • catalog: The catalog name (string, optional).
    • schema: The schema name (string, optional).
  • show_table_history

  • Displays the chronological ledger of changes, including lineage and ancestry tracking.

  • Contains information regarding snapshot timing and structural evolution.
  • Parameters:

    • table: The table identifier (string type, required).
    • catalog: The catalog name (string, optional).
    • schema: The schema name (string, optional).
  • show_metadata_log_entries

  • Lists the sequence of metadata files that constitute the table's history log.

  • Details file locations and the sequential order of metadata updates.
  • Parameters:

    • table: The table identifier (string type, required).
    • catalog: The catalog name (string, optional).
    • schema: The schema name (string, optional).
  • show_snapshots

  • Returns details about all current and previous table snapshots.

  • Information includes associated operation types and linked manifest files.
  • Parameters:

    • table: The table identifier (string type, required).
    • catalog: The catalog name (string, optional).
    • schema: The schema name (string, optional).
  • show_manifests

  • Lists the manifest files pertaining to the active or specified historical snapshots.

  • Includes statistics summarizing the data files referenced by those manifests.
  • Parameters:

    • table: The table identifier (string type, required).
    • catalog: The catalog name (string, optional).
    • schema: The schema name (string, optional).
    • all_snapshots: A boolean indicating whether to examine all snapshots (optional).
  • show_partitions

  • Displays the partition statistics mapped across the table's data.

  • Contains aggregated counts regarding files grouped by partition values.
  • Parameters:

    • table: The table identifier (string type, required).
    • catalog: The catalog name (string, optional).
    • schema: The schema name (string, optional).
  • show_files

  • Lists the actual data files comprising the structure of the current snapshot.

  • Provides granular metadata for each file, including column-level statistics.
  • Parameters:

    • table: The table identifier (string type, required).
    • catalog: The catalog name (string, optional).
    • schema: The schema name (string, optional).
  • show_entries

  • Displays detailed entries from manifest files for the current or all tracked snapshots.

  • Includes file status and comprehensive metrics at the entry level.
  • Parameters:

    • table: The table identifier (string type, required).
    • catalog: The catalog name (string, optional).
    • schema: The schema name (string, optional).
    • all_snapshots: A boolean to include data from all snapshots (optional).
  • show_refs

  • Retrieves information concerning table references such as branches and tags.

  • Details the configuration of these references and which snapshot they currently point to.
  • Parameters:
    • table: The table identifier (string type, required).
    • catalog: The catalog name (string, optional).
    • schema: The schema name (string, optional).

Query History

  • show_query_history
  • Retrieves a record of recently executed queries processed by the system.
  • Parameters:
    • limit: Specifies the maximum number of query records to return (number type, optional).
  • Database Management System (DBMS): Software managing data storage and retrieval.
  • Trino: A distributed SQL query engine designed for federated data access across diverse sources.
  • Apache Iceberg: An open table format for huge analytic datasets, enhancing reliability over traditional file systems.
  • SQL: The standard language used for interacting with relational database models.
  • Data Modeling: The process of defining how data is structured and related within a system.

Extra Details

While earlier systems relied on physical storage methods like index cards for data organization, modern database systems handle vast scales, often leveraging distributed computing architectures like those Trino manages. Efficient data representation and ensuring fault tolerance are primary concerns in designing such large-scale data handling tools. Security and privacy considerations for sensitive data remain critical aspects of any robust database system implementation.

License

This software is distributed under the terms of the Apache License, version 2.0. Full license stipulations are available within the designated LICENSE file.

Conclusion

This utility streamlines complex interactions with high-performance, distributed database systems like Trino, especially when coupled with formats like Iceberg. By abstracting maintenance and metadata inspection into standardized callable functions, it supports building reliable applications that analyze and transform organizational data effectively.

See Also

`