StarRocks Official MCP Server

The StarRocks MCP Server functions as an intermediary layer, connecting artificial intelligence conversational interfaces with StarRocks data stores. It grants permission for imperative SQL command submission, deep schema traversal, creation of visual charts from query outputs, and fetching comprehensive metadata summaries, all while abstracting away complex client-side bootstrapping procedures.

Capabilities Overview

Imperative SQL Execution: Supports running SELECT data retrieval requests (read_query) alongside structure/data manipulation statements (write_query).
Data Source Discovery: Enables listing of available schemas and tables, and retrieval of table structural definitions via starrocks:// uniform resource identifiers.
System Metrics Access: Provides pathways to interrogate StarRocks' internal operational statistics and state data using the proc:// resource namespace.
In-Depth Summaries: Offers rich, complete summaries for individual tables (table_overview) or entire databases (db_overview), detailing column specifications, row volumes, and sample tuples.
Automated Visual Encoding: Executes analytical queries and materializes interactive Plotly visualizations directly from the resultant dataset (query_and_plotly_chart).
Intelligent Result Caching: Metadata and summary outputs (schemas, counts) are held in transient memory to accelerate repeated inquiries. This caching mechanism is explicitly bypassable when current data is mandatory.
Adaptable Deployment Settings: Connection parameters and operational modes are configured primarily through environment variables.

Deployment & Initialization

The MCP service is conventionally initiated through an overarching MCP Host mechanism. Configuration parameters dictate the runtime environment for the StarRocks MCP Server process.

Streamable HTTP Transport (Preferred Method):

To launch the service utilizing Streamable HTTP:

Pre-flight connection integrity check:

$ STARROCKS_URL=root:@localhost:8000 uv run mcp-server-starrocks --test

Server initiation:

uv run mcp-server-starrocks --mode streamable-http --port 8000

Subsequent configuration within the master MCP manifest:

{ "mcpServers": { "mcp-server-starrocks": { "url": "http://localhost:8000/mcp" } } }

Using uv with Packaged Installation (Per-Environment Variables):

{ "mcpServers": { "mcp-server-starrocks": { "command": "uv", "args": ["run", "--with", "mcp-server-starrocks", "mcp-server-starrocks"], "env": { "STARROCKS_HOST": "default localhost", "STARROCKS_PORT": "default 9030", "STARROCKS_USER": "default root", "STARROCKS_PASSWORD": "default empty", "STARROCKS_DB": "default empty" } } } }

Using uv with Packaged Installation (Connection URI Precedence):

{ "mcpServers": { "mcp-server-starrocks": { "command": "uv", "args": ["run", "--with", "mcp-server-starrocks", "mcp-server-starrocks"], "env": { "STARROCKS_URL": "root:password@localhost:9030/my_database" } } } }

Using uv with Local Source Directory (Development Workflow):

{ "mcpServers": { "mcp-server-starrocks": { "command": "uv", "args": [ "--directory", "path/to/mcp-server-starrocks", // <-- Update this path "run", "mcp-server-starrocks" ], "env": { "STARROCKS_HOST": "default localhost", "STARROCKS_PORT": "default 9030", "STARROCKS_USER": "default root", "STARROCKS_PASSWORD": "default empty", "STARROCKS_DB": "default empty" } } } }

Using uv with Local Source Directory and Connection URI:

Available Command-Line Parameters:

The service exposes the following runtime switches:

bash uv run mcp-server-starrocks --help

--mode {stdio,sse,http,streamable-http}: Defines the communication protocol (Default: stdio or defined by MCP_TRANSPORT_MODE).
--host HOST: Specifies the network interface binding for HTTP modes (Default: localhost).
--port PORT: Defines the listening TCP port for HTTP modes.
--test: Executes a diagnostic check to validate basic connectivity.

Execution Examples:

bash

Activate streamable HTTP on a specific address/port

uv run mcp-server-starrocks --mode streamable-http --host 0.0.0.0 --port 8080

Activate standard I/O mode (default mechanism)

uv run mcp-server-starrocks --mode stdio

Execute diagnostic checks

uv run mcp-server-starrocks --test

The url configuration field mandates the endpoint URL for the Streamable HTTP transport mechanism.
When configured this way, client agents interact via standard HTTP POST requests carrying JSON payloads. Specialized client libraries are not required.
All exposed tool interfaces strictly adhere to JSON input/output schema definitions.

Crucial Notice: The sse (Server-Sent Events) transport modality is obsolete and unsupported. Users must transition to Streamable HTTP for all contemporary and future integrations.

Environmental Configuration Directives

You configure connectivity to StarRocks either granularly or via a unified URI:

Method 1: Segmented Environment Variables

STARROCKS_HOST: (Optional) FQDN or IP of the StarRocks Front-End service. Default: localhost.
STARROCKS_PORT: (Optional) MySQL wire protocol port for the FE. Default: 9030.
STARROCKS_USER: (Optional) Authenticating principal name. Default: root.
STARROCKS_PASSWORD: (Optional) Security credential. Default: empty string.
STARROCKS_DB: (Optional) Pre-selected schema for unqualified access attempts. If populated, the connection will attempt USE <db>. Default: empty (no default schema).

Method 2: Unified Connection URI (Overrides Segmented Variables)

STARROCKS_URL: (Optional) A complete connection string encapsulating all credentials. Format: [<protocol>://]user:credential@host:port/schema. The protocol is optional. When active, this variable supersedes all preceding individual connection environment settings.

Examples: - root:mypass@local.svc:9030/analytics - mysql://sysop:secure@data.corp.net:9030/prod_warehouse
- starrocks://user_x:token_y@10.0.0.5:9030/finance

Supplementary Runtime Controls

STARROCKS_OVERVIEW_LIMIT: (Optional) A non-binding upper bound on the byte size for text generated by metadata retrieval functions (table_overview, db_overview) during cache population. Intended to mitigate excessive transient memory pressure from vast schemas. Default: 20000 bytes.
STARROCKS_MYSQL_AUTH_PLUGIN: (Optional) Specifies the client-side authentication method for the MySQL protocol handshake. Set this if StarRocks requires explicit methods like mysql_clear_password. Default behavior relies on the library's standard plugin selection.
MCP_TRANSPORT_MODE: (Optional) Defines the external interface method of the MCP Server:
stdio (Baseline): Relies on standard I/O streams, ideal for host orchestration.
streamable-http (Recommended): Initializes an HTTP server supporting modern RESTful interactions.
sse: (Deprecated) Legacy streaming method. Usage discouraged; prefer Streamable HTTP uniformly.

Operational Interfaces (Tools)

read_query
Purpose: Executes retrieval operations (SELECT) or metadata queries that yield structured results (e.g., SHOW, DESCRIBE).
Input Payload:

{ "query": "SQL expression", "db": "target schema name (optional, falls back to environment default)" }
Return Value: Textual representation of the result set, formatted like CSV (including header), followed by a summary of processed rows. Errors are returned as text.
write_query
Purpose: Executes schema modifications (e.g., CREATE, DROP) or data manipulation (e.g., INSERT, UPDATE).
Input Payload:

{ "query": "DDL/DML statement", "db": "target schema name (optional)" }
Return Value: Text confirmation of transactional success (e.g., "Operation successful") or diagnostic failure message. Changes are finalized immediately upon success.
analyze_query
Description: Fetches query execution plan or profiling data for optimization insight.
Input:

{ "uuid": "Execution Identifier (32 hex chars, 8-4-4-4-12 format)", "sql": "The SQL statement itself", "db": "database context (optional)" }
Output: Text report derived from ANALYZE PROFILE FROM (if UUID is present) or EXPLAIN ANALYZE (if SQL is present).
query_and_plotly_chart
Purpose: Executes a data retrieval statement and generates an interactive Plotly visualization based on specified Python logic.
Input:

{ "query": "Data selection SQL", "plotly_expr": "Python expression utilizing 'px' (Plotly Express) operating on 'df' (the resulting data frame). Example: 'px.bar(df, x=\"name\", y=\"value\")'", "db": "database context (optional)" }
Output: A dual-part response:
1. TextContent: Data frame summary and notification regarding visualization intent.
2. ImageContent: The generated chart serialized as a base64 PNG (image/png). Fails gracefully if data is null or execution errors occur.
table_overview
Purpose: Gathers essential metadata for one table: column definitions (via DESCRIBE), total record count, and a few sample rows (LIMIT 3). Caching applies unless explicitly overridden.
Input:

{ "table": "Identifier (e.g., 'schema.tbl' or just 'tbl'). Uses STARROCKS_DB if schema omitted.", "refresh": false // Boolean to force cache bypass. Default is false. }
Return Value: Formatted text summarizing structure, counts, and samples, or an error message. Cached entries retain prior fetch errors if encountered.
db_overview
Purpose: Compiles structural summaries, row counts, and samples for every table residing within a designated schema. Leverages the table-level cache for constituent tables.
Input:

{ "db": "schema_identifier", // Optional if default schema is active. "refresh": false // Boolean to invalidate cache for all tables in this schema. Default is false. }
Return Value: Concatenated text blocks, separated by clear headings, representing the overview for each discovered table. Returns an error if schema access fails or the schema is empty.

System Resources (URIs)

Unstructured Accessors

starrocks:///databases
Synopsis: Enumerates all accessible schemas.
Underlying Operation: SHOW DATABASES
MIME Type: text/plain

Template-Based Accessors

starrocks:///{db}/{table}/schema
Synopsis: Retrieves the full SQL definition for table creation.
Underlying Operation: SHOW CREATE TABLE {db}.{table}
MIME Type: text/plain
starrocks:///{db}/tables
Synopsis: Yields a list of entities within the specified schema.
Underlying Operation: SHOW TABLES FROM {db}
MIME Type: text/plain
proc:///{+path}
Synopsis: Interface to query internal StarRocks administrative state information, mirroring Unix's /proc filesystem structure. {path} defines the specific administrative node.
Underlying Operation: SHOW PROC '/{path}'
MIME Type: text/plain
Key Paths:
- /frontends - Status of FE cluster members.
- /backends - Status of BE cluster members (traditional deployments).
- /compute_nodes - Status of CN cluster members (cloud-native deployments).
- /dbs - High-level database registry.
- /dbs/<DB_ID> - Detailed metadata for a specific database.
- /dbs/<DB_ID>/<TABLE_ID> - Detailed metadata for a specific entity.
- /dbs/<DB_ID>/<TABLE_ID>/partitions - Partition configuration for an entity.
- /transactions - Aggregated transactional status.
- /transactions/<DB_ID> - Transactional status scoped to a database ID.
- /transactions/<DB_ID>/running - Active transactions.
- /transactions/<DB_ID>/finished - Recently completed transactions.
- /jobs - Status of background tasks (e.g., Schema Change, Rollup).
- /statistic - Summary statistics per database.
- /tasks - Agent activity records.
- /cluster_balance - Current load distribution metrics.
- /routine_loads - Status of streaming ingestion jobs.
- /colocation_group - Grouping information for Colocation Joins.
- /catalog - Information on external data sources (Hive, Iceberg, etc.).

Predefined Interaction Sequences (Prompts)

This server defines no standardized interaction sequences.

Internal Persistence Strategy

The table_overview and db_overview interfaces utilize a memory-resident key-value store for result persistence.
The cache key employs a composite identifier: (schema_name, table_name).
Upon invocation of table_overview, the store is checked. If an entry exists and the refresh flag is explicitly false (the default), the cached string is immediately returned. Otherwise, data is fetched from StarRocks, persisted to cache, and then returned.
db_overview recursively checks the cache for every table within its scope, applying the same caching logic. Setting refresh=true forces a full reprocessing of all tables within that schema.
The STARROCKS_OVERVIEW_LIMIT variable acts as a soft ceiling on the byte count for any single entry stored in the cache, aiding memory governance.
Stored entries retain the output generated during the initial fetch, including any operational errors encountered at that time, ensuring consistent responses for cache hits.

Diagnostic Access

Post-initialization of the MCP server, debugging introspection can be performed using the inspector utility:

npx @modelcontextprotocol/inspector

Illustrative Context

WIKIPEDIA: Cloud computing is "a paradigm for enabling network access to a scalable and elastic pool of shareable physical or virtual resources with self-service provisioning and administration on-demand," according to ISO. It is commonly referred to as "the cloud".

== Characteristics == In 2011, the National Institute of Standards and Technology (NIST) identified five "essential characteristics" for cloud systems. Below are the exact definitions according to NIST:

On-demand self-service: "A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service provider." Broad network access: "Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, tablets, laptops, and workstations)." Resource pooling: " The provider's computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand." Rapid elasticity: "Capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly outward and inward commensurate with demand. To the consumer, the capabilities available for provisioning often appear unlimited and can be appropriated in any quantity at any time." Measured service: "Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service. By 2023, the International Organization for Standardization (ISO) had expanded and refined the list.

== History ==

The history of cloud computing extends to the 1960s, with the initial concepts of time-sharing becoming popularized via remote job entry (RJE). The "data center" model, where users submitted jobs to operators to run on mainframes, was predominantly used during this era. This was a time of exploration and experimentation with ways to make large-scale computing power available to more users through time-sharing, optimizing the infrastructure, platform, and applications, and increasing efficiency for end users. The "cloud" metaphor for virtualized services dates to 1994, when it was used by General Magic for the universe of "places" that mobile agents in the Telescript environment could "go". The metaphor is credited to David Hoffman, a General Magic communications specialist, based on its long-standing use in networking and telecom. The expression cloud computing became more widely known in 1996 when Compaq Computer Corporation drew up a business plan for future computing and the Internet. The company's ambition was to superch