keboola-mcp-gateway
Facilitate programmatic interaction with Keboola Connection data assets, including table metadata management, execution of complex SQL constructs, and efficient data exportation for augmented analytical pipelines leveraging diverse generative artificial intelligence entities.
Author

keboola
Quick Info
Actions
Tags
Keboola Machine Communication Protocol Server
Establish a robust conduit between your Keboola environment and sophisticated AI agents/clients (e.g., Cursor, Claude, Windsurf, VS Code). Expose critical functionalities like underlying data structures, procedural SQL operations, and asynchronous job initiation without requiring intermediary integration code. Ensure precise data delivery to assistants exactly when and where required.
High-Level Summary
Keboola MCP Server functions as an open-source intermediary layer connecting your Keboola workspace with contemporary artificial intelligence frameworks. It transforms native Keboola capabilities—such as storage access control, in-database SQL processing, and job orchestration—into callable functions for platforms like Claude, Cursor, CrewAI, LangChain, Amazon Q, and others.
Core Capabilities
Leveraging the AI Agent and MCP Server synergy permits the following actions:
- Data Repository: Directly interrogate data structures and manipulate documentation pertaining to tables or storage containers.
- Configuration Artifacts: Instantiate, enumerate, and examine definitions for extractors, writers, data applications, and transformation blueprints.
- SQL Execution: Generate intricate SQL transformations using conversational natural language input.
- Job Orchestration: Initiate component and transformation processes, retrieving detailed outcomes of execution records.
- Workflow Management: Define and govern sequential pipelines utilizing Conditional Flow mechanisms and Orchestrator Flows.
- Data Applications: Provision, deploy, and govern Keboola Streamlit Data Applications that visualize storage data via custom queries.
- Metadata Manipulation: Perform searches, retrievals, and modifications on project documentation and object attributes via semantic queries.
- Development Sandboxing: Safely iterate within isolated development branches, constraining all operations to the selected branch context.
🚀 Remote Access Initialization (Fastest Path)
To expedite deployment, utilize the Hosted MCP Server option. This managed service removes all local installation, configuration, or environment setup burdens.
What is the Hosted Server?
This server resides within every multi-tenant Keboola instance and utilizes OAuth for secure identity verification. It is connectable from any AI assistant supporting remote Server-Sent Events (SSE) communication and OAuth authentication.
Connection Procedure
- Acquire Server Endpoint: Navigate to Project Settings →
MCP Serversection in your Keboola instance. - Capture URL: Copy the endpoint, typically formatted as
https://mcp.<YOUR_REGION>.keboola.com/sse. - Configure Agent: Input this endpoint address into your AI assistant's designated MCP settings.
- Authorization: You will be redirected to authenticate with your Keboola credentials and select the target project.
Compatible Agents
- Cursor: Integration via the "Install In Cursor" prompt in your project's MCP Server settings or dedicated deep link.
- Claude Desktop: Integrate through Settings → Integrations menu.
- Windsurf: Setup requires inputting the remote endpoint URL.
- Make: Integration configured using the remote server URL.
- Other MCP Interfaces: Configure using the provided remote endpoint address.
For granular setup instructions and region-specific URLs, consult the Remote Server Setup documentation.
Utilizing Dev Branches
Development work can be securely conducted within Keboola development branches without impacting production assets. Remote MCP Servers honor the KBC_BRANCH_ID parameter, confining all actions to the specified branch. The branch identifier is discoverable in the UI URL during branch navigation (e.g., .../admin/projects/PROJECT_ID/branch/BRANCH_ID/dashboard). This ID must be transmitted in the header X-Branch-Id: <branchId> for every request; otherwise, the production branch is used by default. This scoping should ideally be managed by the connecting AI client.
Local Environment Configuration (Custom or Development Instances)
Execute the MCP server directly on your local machine to gain comprehensive operational oversight and facilitate rapid development cycles. Select this option when customization, local debugging, or swift iteration is paramount. This involves cloning the source repository, providing Keboola access credentials via environment variables or request headers (depending on the chosen communication method), installing prerequisites, and initiating the service. This path grants maximum adaptability (custom tooling, local diagnostics, offline iteration) but mandates manual setup, credential management, and update handling.
The server supports several transport protocols, selectable via the command line argument --transport <protocol>:
- stdio: Default mode if no transport is specified. Uses standard input/output streams, primarily suited for local deployment interacting with a singular client.
- streamable-http: Facilitates remote communication over HTTP utilizing a bidirectional streaming channel, enabling continuous message exchange between client and server. Connect via <url>/mcp (e.g., http://localhost:8000/mcp).
- sse: Deprecated. Transition to streamable-http. Relies on Server-Sent Events (SSE) for unidirectional event streaming from server to client. Connect via <url>/sse (e.g., http://localhost:8000/sse).
- http-compat: A legacy transport supporting both SSE and streamable-http. Currently deployed on remote Keboola services but scheduled for replacement by streamable-http exclusively.
For reliable client-server data transmission, Keboola credentials must be supplied to interact with your project within your specific Keboola Region. Mandatory variables include: KBC_STORAGE_TOKEN, KBC_STORAGE_API_URL, KBC_WORKSPACE_SCHEMA, and optionally KBC_BRANCH_ID.
Credential Provisioning Methods:
- Personal Use (Primarily stdio): Set environment variables prior to launching the server. All subsequent requests inherit these static credentials.
- Multi-User Context: Embed the required variables within the request headers, ensuring each transaction carries its distinct authorization context.
KBC_STORAGE_TOKEN
This token authenticates your access to Keboola services.
Refer to the official Keboola documentation for generating and managing Storage API tokens.
Guidance: For restricted operational scope, employ a custom storage token; for comprehensive project access, utilize the master token.
KBC_WORKSPACE_SCHEMA
This identifier pinpoints your data processing workspace, essential for SQL execution. This is mandatory ONLY when utilizing a custom access token rather than the Master Token:
- Master Token Usage: The workspace is automatically provisioned in the background.
- Custom Token Usage: Follow this Keboola guide to obtain the required
KBC_WORKSPACE_SCHEMA.
Important: If creating the workspace manually, ensure the "Grant read-only access to all Project data" option is selected.
Note: In BigQuery workspaces, KBC_WORKSPACE_SCHEMA corresponds to the Dataset Name; simply initiate the connection and retrieve the Dataset Name.
KBC_STORAGE_API_URL (Region Specification)
Your Keboola deployment region dictates the API endpoint URL. Determine your region by observing the URL in your browser when accessing your Keboola project interface:
| Region | API Endpoint |
|---|---|
| AWS North America | https://connection.keboola.com |
| AWS Europe | https://connection.eu-central-1.keboola.com |
| Google Cloud EU | https://connection.europe-west3.gcp.keboola.com |
| Google Cloud US | https://connection.us-east4.gcp.keboola.com |
| Azure EU | https://connection.north-europe.azure.keboola.com |
KBC_BRANCH_ID (Optional Scoping Parameter)
To target a specific Keboola development branch, set the ID via the KBC_BRANCH_ID parameter. The MCP server isolates its operations to this branch, guaranteeing changes do not propagate to the production environment.
- Default behavior: Production branch is used if this parameter is omitted.
- Development Scope: Set
KBC_BRANCH_IDto the branch's numeric identifier (e.g.,123456). The ID is visible in the UI URL during branch navigation (e.g.,.../admin/projects/PROJECT_ID/branch/BRANCH_ID/dashboard). - Remote Transport Override: For request-by-request modification, employ the HTTP header
X-Branch-Id: <branchId>orKBC_BRANCH_ID: <branchId>.
Software Acquisition
Prerequisites verification:
- [ ] Python version 3.10 or newer installed
- [ ] Authorized access to a Keboola project with administrative permissions
- [ ] Installation of your chosen MCP client (e.g., Claude, Cursor)
Crucial: Ensure the uv package manager is present, as the MCP client will leverage it for automated server download and execution.
Installing uv:
macOS/Linux:
# If Homebrew is absent, execute:
# /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Installation via Homebrew
brew install uv
Windows:
# Utilizing the official installer script
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
# Alternatively, via pip
pip install uv
# Or using the Winget utility
winget install --id=astral-sh.uv -e
Further installation methodologies are detailed in the official uv documentation.
Executing the Keboola MCP Server
Four distinct operational modes are available, contingent upon your specific requirements:
Option A: Embedded Execution (Recommended Approach)
In this mode, the MCP server lifecycle is managed automatically by Claude or Cursor. No manual terminal command execution is necessary.
- Configure the appropriate settings within your MCP client application.
- The client transparently initiates the MCP server process upon requirement.
Claude Desktop Integration Parameters
- Access Claude (top-left menu) → Settings → Developer → Edit Config (create
claude_desktop_config.jsonif absent). - Incorporate the following JSON block:
- Restart Claude Desktop to apply the modifications.
{
"mcpServers": {
"keboola": {
"command": "uvx",
"args": ["keboola_mcp_server --transport <transport>"],
"env": {
"KBC_STORAGE_API_URL": "https://connection.YOUR_REGION.keboola.com",
"KBC_STORAGE_TOKEN": "your_keboola_storage_token",
"KBC_WORKSPACE_SCHEMA": "your_workspace_schema",
"KBC_BRANCH_ID": "your_branch_id_optional"
}
}
}
}
Configuration File Locations:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
Cursor Integration Parameters
- Navigate to Settings → MCP.
- Select the option to "+ Add new global MCP Server".
- Apply the following configuration:
{
"mcpServers": {
"keboola": {
"command": "uvx",
"args": ["keboola_mcp_server --transport <transport>"],
"env": {
"KBC_STORAGE_API_URL": "https://connection.YOUR_REGION.keboola.com",
"KBC_STORAGE_TOKEN": "your_keboola_storage_token",
"KBC_WORKSPACE_SCHEMA": "your_workspace_schema",
"KBC_BRANCH_ID": "your_branch_id_optional"
}
}
}
}
Naming Convention: MCP server identifiers should be concise. Due to a combined length constraint (tool name + server name) typically around 60 characters, overly verbose names may be truncated or omitted by the Agent interface.
Cursor Configuration for Windows WSL Environments
If the MCP server is being executed via Windows Subsystem for Linux (WSL) concurrently with Cursor, use this execution wrapper:
{
"mcpServers": {
"keboola":{
"command": "wsl.exe",
"args": [
"bash",
"-c '",
"export KBC_STORAGE_API_URL=https://connection.YOUR_REGION.keboola.com &&"
"export KBC_STORAGE_TOKEN=your_keboola_storage_token &&"
"export KBC_WORKSPACE_SCHEMA=your_workspace_schema &&"
"export KBC_BRANCH_ID=your_branch_id_optional &&"
"/snap/bin/uvx keboola_mcp_server --transport <transport>"
"'"
]
}
}
}
Option B: Local Source Code Execution Mode
Intended for developers actively modifying the MCP server source code:
- Clone the repository and establish the local Python environment.
- Direct Claude/Cursor to utilize your local Python interpreter path:
{
"mcpServers": {
"keboola": {
"command": "/absolute/path/to/.venv/bin/python",
"args": [
"-m",
"keboola_mcp_server --transport <transport>"
],
"env": {
"KBC_STORAGE_API_URL": "https://connection.YOUR_REGION.keboola.com",
"KBC_STORAGE_TOKEN": "your_keboola_storage_token",
"KBC_WORKSPACE_SCHEMA": "your_workspace_schema",
"KBC_BRANCH_ID": "your_branch_id_optional"
}
}
}
}
Option C: Manual Command Line Interface (Testing Purposes Only)
For expedient testing or debugging validation, execution can occur directly within a terminal session:
# Define environmental parameters
export KBC_STORAGE_API_URL=https://connection.YOUR_REGION.keboola.com
export KBC_STORAGE_TOKEN=your_keboola_storage_token
export KBC_WORKSPACE_SCHEMA=your_workspace_schema
export KBC_BRANCH_ID=your_branch_id_optional
uvx keboola_mcp_server --transport sse
Caveat: This manual method defaults to the SSE transport and listens for incoming SSE connections on
localhost:8000. The parameters--portand--hostcan be used to alter the binding address.Note: Manual execution is strictly for diagnostics. Normal operation with Claude or Cursor mandates using the configuration methods outlined above.
Option D: Containerized Deployment via Docker
docker pull keboola/mcp-server:latest
docker run \
--name keboola_mcp_server \
--rm \
-it \
-p 127.0.0.1:8000:8000 \
-e KBC_STORAGE_API_URL="https://connection.YOUR_REGION.keboola.com" \
-e KBC_STORAGE_TOKEN="YOUR_KEBOOLA_STORAGE_TOKEN" \
-e KBC_WORKSPACE_SCHEMA="YOUR_WORKSPACE_SCHEMA" \
-e KBC_BRANCH_ID="YOUR_BRANCH_ID_OPTIONAL" \
keboola/mcp-server:latest \
--transport sse \
--host 0.0.0.0
Note: The container defaults to listening on
localhost:8000using SSE. The port mapping (-p) can be adjusted to redirect traffic elsewhere.
Manual Server Initiation Required?
| Usage Scenario | Manual Start Necessary? | Recommended Configuration Path |
|---|---|---|
| Using Claude/Cursor | No | Configure within the application settings |
| Local MCP Development | No (Client manages start) | Point configuration to local Python executable |
| Ad-hoc CLI Testing | Yes | Execute commands in a terminal session |
| Docker Container Usage | Yes | Run the specified Docker container |
Interacting with the MCP Server
Once your chosen MCP client (e.g., Claude/Cursor) is correctly configured and actively running, initiate data requests against your Keboola assets:
Validation Check
Begin with a fundamental query to confirm end-to-end connectivity:
Provide a catalog of all buckets and tables present within my Keboola workspace.
Illustrative Use Cases
Data Discovery:
- "Identify all stored datasets pertaining to customer records."
- "Execute a selection query to rank the top ten revenue generators."
Analytical Tasks:
- "Perform an analysis on quarterly sales metrics broken down by geographical segment."
- "Determine the statistical correlation between client age cohort and average transaction value."
Data Workflow Automation:
- "Generate a SQL transformation script that merges customer profiles with transaction logs."
- "Trigger the data loading job associated with my external Salesforce extractor component."
System Compatibility
Agent Platform Support Matrix
| MCP Agent | Operational Status | Communication Protocol |
|---|---|---|
| Claude (Desktop & Web) | ✅ Confirmed | stdio |
| Cursor | ✅ Confirmed | stdio |
| Windsurf, Zed, Replit | ✅ Confirmed | stdio |
| Codeium, Sourcegraph | ✅ Confirmed | HTTP+SSE |
| Custom MCP Implementations | ✅ Confirmed | HTTP+SSE or stdio |
Exposed Operational Tools
Agents automatically adapt to the available toolset.
| Domain | Tool Name | Function Description |
|---|---|---|
| Project Core | get_project_info |
Outputs structural metadata concerning the Keboola project environment |
| Storage Layer | get_bucket |
Fetches comprehensive details for a designated storage bucket |
get_table |
Retrieves specifics of a table, including database mapping and schema definition | |
list_buckets |
Enumerate all storage buckets within the project scope | |
list_tables |
Enumerate all tables housed within a specified bucket | |
update_description |
Modify descriptive metadata for buckets, tables, or individual column attributes | |
| SQL Engine | query_data |
Executes arbitrary SQL SELECT statements against the underlying data warehouse |
| Configuration Mgmt | add_config_row |
Adds a new row entry to an existing component configuration structure |
create_config |
Generates a top-level configuration object for a component | |
create_sql_transformation |
Constructs a new SQL transformation based on input SQL code blocks | |
find_component_id |
Locates component identifiers matching a descriptive text query | |
get_component |
Fetches detailed configuration metadata for a component via its ID | |
get_config |
Retrieves the full configuration details for a specified component/transformation | |
get_config_examples |
Retrieves sample configuration templates applicable to a component | |
list_configs |
Lists all configuration objects, with optional filtering capabilities | |
list_transformations |
Lists all defined transformation configurations within the project | |
update_config |
Modifies the root definition of a component configuration | |
update_config_row |
Modifies a specific configuration row within a component definition | |
update_sql_transformation |
Updates the definition of an existing SQL transformation artifact | |
| Flow Orchestration | create_conditional_flow |
Provisions a workflow utilizing the keboola.flow definition |
create_flow |
Provisions a workflow utilizing the legacy keboola.orchestrator definition |
|
get_flow |
Retrieves the configuration details for a specific workflow | |
get_flow_examples |
Retrieves sample definitions for valid workflow structures | |
get_flow_schema |
Returns the JSON schema structure applicable to the requested flow type | |
list_flows |
Enumerates all configured workflow definitions in the project | |
update_flow |
Modifies an extant workflow definition | |
| Job Execution | get_job |
Fetches granular status information for a specific job ID |
list_jobs |
Lists recent jobs, supporting filtering, sorting, and pagination | |
run_job |
Initiates an asynchronous execution task for a component or transformation | |
| Data Applications | get_data_apps |
Retrieves details for a specific Data App or lists all Apps in the project. |
modify_data_app |
Creates a new Data App or updates an existing one | |
deploy_data_app |
Manages the deployment status (active/suspended) of Streamlit Data Applications | |
| Documentation Access | docs_query |
Provides semantic answers derived exclusively from Keboola platform documentation |
| Utility | create_oauth_url |
Generates a secure OAuth authorization URI for component setup |
search |
Performs a broad search across project artifacts based on name substrings |
Troubleshooting Guide
Frequent Error Resolution
| Symptom | Remediation Strategy |
|---|---|
| Authorization Failure | Validate the integrity and permissions associated with KBC_STORAGE_TOKEN |
| Workspace Reference Error | Confirm the accuracy of the KBC_WORKSPACE_SCHEMA setting |
| Connection Interruption | Inspect local network connectivity and firewall rules |
Development Environment Setup
Initial Dependency Installation
Standard setup:
uv sync --extra dev
Using this baseline, execute uv run tox to run automated tests and conformity checks.
Optimized setup (recommended for full development lifecycle):
uv sync --extra dev --extra tests --extra integtests --extra codestyle
This optimized command installs packages required for rigorous testing and style enforcement, enabling IDEs (like VsCode or Cursor) to accurately lint code and execute localized test suites.
Integration Testing Execution
To execute local integration validation suites, use the command: uv run tox -e integtests. NOTE: This process necessitates the presence of the following environmental variables:
INTEGTEST_STORAGE_API_URLINTEGTEST_STORAGE_TOKENINTEGTEST_WORKSPACE_SCHEMA
These required values must be sourced from a dedicated Keboola project reserved exclusively for integration testing purposes.
Lock File Management
When dependencies are added or removed, the uv.lock manifest must be regenerated. For release preparation, consider updating existing package versions using uv lock --upgrade.
Documentation Synchronization
If modifications are made to any tool descriptions (i.e., docstrings within the tool functions), the TOOLS.md artifact must be regenerated to mirror these functional updates:
uv run python -m src.keboola_mcp_server.generate_tool_docs
Support Channels and Feedback Submission
For reporting defects, proposing enhancements, or seeking assistance, the designated primary pathway is by submitting a new issue on GitHub.
The development contributors actively monitor the issue tracker and commit to providing timely responses. For general inquiries regarding the broader Keboola platform, consult the subsequent resources.
Reference Materials
- End-User Documentation Portal
- API & Developer Documentation
- Keboola Platform Homepage
- Issue Submission Portal ← Preferred Channel for MCP Server Communication
Connectivity
WIKIPEDIA REFERENCE: XMLHttpRequest (XHR) defines an Application Programming Interface, embodied as a JavaScript object, designed to dispatch HTTP requests from a web browser context to a remote web server. Its methods enable browser-based applications to submit queries to the server post-page-load and receive asynchronous responses. XHR is foundational to Ajax programming paradigms. Before its advent, server interaction relied predominantly on standard hyperlink navigation and form submissions, often leading to full-page refreshes. == Genesis == The conceptual foundation for XMLHttpRequest was established in 2000 by the development team at Microsoft Outlook. This concept was subsequently implemented within the Internet Explorer 5 browser release (1999). However, the initial invocation syntax did not standardize on the XMLHttpRequest identifier. Instead, developers relied on ActiveXObject("Msxml2.XMLHTTP") or ActiveXObject("Microsoft.XMLHTTP"). As of Internet Explorer 7 (2006), universal browser adoption of the XMLHttpRequest identifier was achieved. The XMLHttpRequest identifier has since solidified as the prevailing standard across all major browser engines, including Mozilla’s Gecko (2002), Safari 1.2 (2004), and Opera 8.0 (2005). === Standardization === The World Wide Web Consortium (W3C) published the initial Working Draft specification for the XMLHttpRequest object on April 5, 2006. A Level 2 specification, introducing mechanisms for monitoring request progress, enabling cross-origin communication, and handling binary data streams, followed on February 25, 2008. By the close of 2011, the Level 2 features were merged back into the primary specification. Since late 2012, development stewardship transitioned to WHATWG, which maintains the living document using Web IDL definitions. == Operational Procedure == Sending a request via XMLHttpRequest generally involves a sequence of programming steps: Instantiate an XMLHttpRequest object via its constructor. Invoke the open method to define request type (GET/POST), specify the target resource URI, and select synchronous or asynchronous execution mode. For asynchronous operations, establish a listener callback function to handle state transitions. Commence the transmission by calling the send method. Monitor the state changes within the registered event handler. Upon server response completion, the state transitions to 4 (the "done" state), and retrieved data is typically available in the responseText property. Beyond these core steps, XHR offers extensive control over request dispatch and response processing. Custom header fields can be injected to guide server processing, and data payloads can be uploaded using the argument provided to the send call. Responses can be pre-parsed from JSON into native JavaScript objects or processed incrementally as they arrive. Furthermore, requests can be forcibly terminated (abort) or configured to time out if completion is delayed beyond a set threshold. == Cross-Domain Interactions == Early in the World Wide Web's evolution, methods were discovered that allowed circumvention of the same-origin security policy, leading to significant architectural considerations regarding data access boundaries.
