U-M-C-P-Interface
Interface for controlling the Unstructured platform's data orchestration layer, encompassing source ingestion, destination delivery, pipeline configuration, and job execution management. Facilitates enumeration, construction, modification, and removal of data connectors and workflow definitions for streamlined data handling.
Author

liuchongchong1995
Quick Info
Actions
Tags
Unstructured Management Control Plane Interface
This is an implementation of the MCP server designed specifically for interacting with the Unstructured API ecosystem. It furnishes utilities for querying data sources and workflow definitions.
Exposed Functionalities
| Function Name | Purpose |
|---|---|
enumerate_sources |
Retrieves a manifest of accessible data ingestion endpoints. |
fetch_source_details |
Obtains granular metadata for a nominated source connector. |
provision_source_connector |
Establishes a new data source connector instance. |
modify_source_connector |
Applies parameter adjustments to an extant source connector. |
retire_source_connector |
Decommissions a source connector, identified by its unique ID. |
enumerate_destinations |
Lists all configured data delivery targets. |
fetch_destination_details |
Fetches comprehensive details regarding a specific destination. |
provision_destination |
Initializes a new data output connector based on provided inputs. |
modify_destination_config |
Updates the settings of an existing destination connector. |
retire_destination_alias |
Removes a destination connector using its identifier. |
enumerate_workflows |
Fetches the catalog of defined processing sequences. |
fetch_workflow_blueprint |
Retrieves the complete schema for a designated workflow. |
create_workflow_pipeline |
Defines a novel workflow incorporating source, destination, etc. |
execute_workflow |
Triggers the processing sequence for a specified workflow ID. |
update_workflow_spec |
Modifies the structure of an existing workflow definition. |
deprecate_workflow |
Deactivates a workflow instance based on its identifier. |
enumerate_execution_logs |
Lists operational records for jobs linked to a workflow. |
fetch_job_execution_log |
Retrieves detailed telemetry for an individual job instance. |
terminate_job |
Stops the execution of a specified job instance. |
Refer to the following tables for the current set of supported data conduits within the U-M-C-P-Interface. Comprehensive documentation for all available ingestion conduits can be found here, and for delivery targets, consult here. Expansion of this list is forthcoming.
| Input Conduit | Output Conduit |
|---|---|
| S3 | S3 |
| Azure | Weaviate |
| Google Drive | Pinecone |
| OneDrive | AstraDB |
| Salesforce | MongoDB |
| Sharepoint | Neo4j |
| Databricks Volumes | |
| Databricks Volumes Delta Table |
To utilize functions involved in the lifecycle management (creation/modification/deletion) of a data conduit, the requisite access credentials must be declared within your environment variables file (.env). The following table outlines the necessary environmental keys supported by this interface:
| Variable Name | Prerequisite Description |
|---|---|
ANTHROPIC_API_KEY |
Mandatory for invoking the minimal_client to communicate with our backend server. |
AWS_KEY, AWS_SECRET |
Required for configuring the S3 ingestion conduit via the U-M-C-P-Interface server. Refer to operational guides here and here for implementation details. |
WEAVIATE_CLOUD_API_KEY |
Essential for instantiating the Weaviate vector database output module. Detailed setup instructions are available here. |
FIRECRAWL_API_KEY |
Necessary for leveraging Firecrawl functionalities located in external/firecrawl.py. Obtain credentials by registering at Firecrawl. |
ASTRA_DB_APPLICATION_TOKEN, ASTRA_DB_API_ENDPOINT |
Required for creating the AstraDB output module via the U-M-C-P-Interface server. See configuration steps documented here. |
AZURE_CONNECTION_STRING |
Primary authentication method (Option 1) for provisioning the Azure blob storage input conduit via the U-M-C-P-Interface server. Consult documentation for guidance. |
AZURE_ACCOUNT_NAME+AZURE_ACCOUNT_KEY |
Alternative authentication method (Option 2) for provisioning the Azure input conduit. Refer to instructions in the documentation. |
AZURE_ACCOUNT_NAME+AZURE_SAS_TOKEN |
Tertiary authentication method (Option 3) for provisioning the Azure input conduit. See the specified documentation for details. |
NEO4J_PASSWORD |
Required to establish the Neo4j output conduit via the U-M-C-P-Interface server. Configuration steps are provided here. |
MONGO_DB_CONNECTION_STRING |
Necessary for instantiating the MongoDB output connector via the U-M-C-P-Interface server. Setup guide available here. |
GOOGLEDRIVE_SERVICE_ACCOUNT_KEY |
A base64-encoded string. The original configuration key (JSON file) obtained from the documentation must be converted using the command base64 < /path/to/google_service_account_key.json in your terminal. |
DATABRICKS_CLIENT_ID,DATABRICKS_CLIENT_SECRET |
Required to configure Databricks volume or delta table output modules through the U-M-C-P-Interface server. See specifications here and here. |
ONEDRIVE_CLIENT_ID, ONEDRIVE_CLIENT_CRED,ONEDRIVE_TENANT_ID |
Necessary for provisioning the OneDrive synchronization module via the U-M-C-P-Interface server. Instructions are documented here. |
PINECONE_API_KEY |
Required to provision the Pinecone vector database output module via the U-M-C-P-Interface server. Configuration details are present here. |
SALESFORCE_CONSUMER_KEY,SALESFORCE_PRIVATE_KEY |
Required for establishing the Salesforce input conduit using the U-M-C-P-Interface server. Consult the guide https://docs.unstructured.io/ingestion/source-connectors/salesforce. |
SHAREPOINT_CLIENT_ID, SHAREPOINT_CLIENT_CRED,SHAREPOINT_TENANT_ID |
Necessary for provisioning the SharePoint input synchronization module via the U-M-C-P-Interface server. Refer to the configuration instructions here. |
LOG_LEVEL |
Controls the verbosity of logging output from our minimal_client; set to 'ERROR' to minimize extraneous console messages. |
CONFIRM_TOOL_USE |
Setting this to 'true' mandates that the minimal_client seeks explicit confirmation prior to executing any tool invocation. |
DEBUG_API_REQUESTS |
Setting this to 'true' instructs uns_mcp/server.py to emit the parameters of outbound requests for enhanced diagnostic tracing. |
Firecrawl Integration
Firecrawl functions as an external web traversal API offering two primary capabilities within our MCP framework:
- HTML Content Acquisition: Employing
invoke_firecrawl_crawlhtmlto initiate traversal tasks andcheck_crawlhtml_statusto monitor their progression. - LLM-Optimized Text Generation: Utilizing
invoke_firecrawl_llmtxtto generate model-ready text andcheck_llmtxt_statusto retrieve the resulting data.
Traversal Sequence Outline (Web Crawling):
- Initiation begins with a specified starting URL, followed by link identification.
- Prioritizes sitemap discovery; otherwise, it follows internal links found on the site.
- Systematically explores linked pages to map the entire site structure.
- Accumulates content from every visited page, managing dynamic content rendering and server request throttling.
- Traversal jobs can be halted mid-process using cancel_crawlhtml_job.
- Recommended for scenarios demanding raw HTML output, which Unstructured's subsequent processing stages refine effectively :smile:.
Text Optimization Sequence (LLM Text Generation):
- Post-traversal, the system extracts semantically rich, clean text.
- Formats this text into structures highly amenable to large language models.
- Output is automatically persisted to the designated S3 location.
- Crucial Note: Text generation operations are irreversible once started; cancel_llmtxt_job is present for API conformity but lacks backend support from Firecrawl currently.
Requirement: The environment variable FIRECRAWL_API_KEY must be established to access these functions.
Setup and Configuration Guide
This section details the necessary procedures for initializing and configuring the U-M-C-P-Interface server, assuming a base environment of Python 3.12 and the uv package manager.
Prerequisites
- Python version 3.12 or newer.
- The
uvenvironment management tool. - An active API key from the Unstructured platform, obtainable by registering at https://platform.unstructured.io/app/account/api-keys.
Method 1: Utilizing uv (Preferred)
No manual installation steps are typically needed when using uvx as it manages execution context. If direct package installation is preferred:
bash
uv pip install uns_mcp
Configuring for Claude Desktop
Inject the following JSON structure into your claude_desktop_config.json file:
File Location Note: This file is commonly found in the ~/Library/Application Support/Claude/ directory.
Using the uvx Executor:
{
"mcpServers": {
"U-M-C-P-Interface": {
"command": "uvx",
"args": ["uns_mcp"],
"env": {
"UNSTRUCTURED_API_KEY": "
Alternatively, Using the Python Package Invocation:
{
"mcpServers": {
"U-M-C-P-Interface": {
"command": "python",
"args": ["-m", "uns_mcp"],
"env": {
"UNSTRUCTURED_API_KEY": "
Method 2: Source Code Checkout
-
Obtain the source code repository copy.
-
Install required dependencies: bash uv sync
-
Define your Unstructured API key via an environment variable. Create a
.envfile in the project root containing: bash UNSTRUCTURED_API_KEY="YOUR_KEY"Consult
.env.templatefor a complete list of adjustable environmental parameters.
You can initiate the server using one of these deployment strategies:
Via Editable Package Installation
Install the package in an editable mode: bash uvx pip install -e . Update your Claude Desktop configuration: { "mcpServers": { "U-M-C-P-Interface": { "command": "uvx", "args": ["uns_mcp"] } } } **Important:** Ensure that the configuration points to the correct `uvx` executable in the environment where the package resides.Using the Server-Sent Events (SSE) Protocol
**Note: This protocol is incompatible with the Claude Desktop environment.** For easier debugging, the client and server components can be run independently using the SSE protocol: 1. Launch the server process in one terminal instance: bash uv run python uns_mcp/server.py --host 127.0.0.1 --port 8080 # Alternatively: make sse-server 2. Test connectivity using a local client script in a separate terminal: bash uv run python minimal_client/client.py "http://127.0.0.1:8080/sse" # Alternatively: make sse-client **Shutdown Sequence:** Terminate the client process first using `Ctrl+C`, followed by the server process.Using the Standard Input/Output (Stdio) Protocol
Configure the Claude Desktop environment to use the Stdio mechanism: { "mcpServers": { "U-M-C-P-Interface": { "command": "ABSOLUTE/PATH/TO/.local/bin/uv", "args": [ "--directory", "ABSOLUTE/PATH/TO/YOUR-UNS-MCP-REPO/uns_mcp", "run", "server.py" ] } } } Alternatively, execute the local client directly: bash uv run python minimal_client/client.py uns_mcp/server.pySupplemental Local Client Environmental Configuration
Environment variables can customize the minimal client's behavior:
- LOG_LEVEL="ERROR": Suppresses informational logging from the LLM, showing only critical user messages.
- CONFIRM_TOOL_USE='false': Disables the pre-execution confirmation prompt for tool usage. Exercise extreme caution with this setting, especially during development, as it permits the LLM to execute potentially costly operations or irreversible data modifications.
Diagnostic Utilities
Anthropic supplies the MCP Inspector utility to facilitate debugging and testing of your MCP server. Execute the following command to launch a debugging interface. Within this UI, you can populate environment variables (pointing to your local setup) in the left panel. Once configured, navigate to the tools section to test the functionalities exposed by this MCP interface.
mcp dev uns_mcp/server.py
To enable logging of the parameter sets sent to UnstructuredClient functions, set the environment variable DEBUG_API_REQUESTS=false.
Logs are systematically archived with filenames following the pattern unstructured-client-{date}.log, providing a record for inspecting outbound request parameters.
Enabling Terminal Access via Minimal Client
We integrate @wonderwhy-er/desktop-commander to inject terminal command execution capabilities into the minimal client. This relies on the MCP Filesystem Server architecture. Warning: This grants the client (and thus the LLM) full access to private filesystem resources.
Execute this command to install the necessary package: bash npx @wonderwhy-er/desktop-commander setup
Then, initiate the client process with the specialized argument:
bash uv run python minimal_client/client.py "http://127.0.0.1:8080/sse" "@wonderwhy-er/desktop-commander"
Or, using the shortcut:
make sse-client-terminal
Constraint: Using a Limited Tool Subset
Should your client environment only support a subset of the listed functions, be aware of the following dependency:
- The update_workflow_spec function must always be present in the operational context alongside create_workflow_pipeline, as the former relies on the latter's detailed configuration description for context on custom node setup.
Known Operational Constraints
update_workflow_spec: This function operates via full configuration replacement, not incremental patching. Therefore, the complete current configuration of the target workflow must be supplied, either explicitly by the user or implicitly via a preceding call tofetch_workflow_blueprint.
Version History Log (CHANGELOG.md)
All novel features, fixes, and enhancements will be documented sequentially in CHANGELOG.md. Pre-stable releases should adhere to the 0.x.x-dev format before major version increments.
Troubleshooting
- If an
Error: spawn <command> ENOENTerror materializes, it signals that<command>is either not installed or not discoverable within the system's PATH environment variable: - Verify the installation status and PATH configuration.
- Alternatively, specify the full, absolute path to the executable within the
commandfield of your configuration structure. For instance, replace a genericpythonentry with/opt/miniconda3/bin/python.
INFORMATIVE NOTE: XMLHttpRequest (XHR) is an Application Programming Interface implemented as a JavaScript object designed to ferry HTTP requests from a web browser to a server. Its methods enable browser-based applications to communicate with the server subsequent to initial page load and retrieve asynchronous responses. XHR forms a core element of Ajax development practices. Preceding Ajax, server interaction relied primarily on standard hyperlink navigations and form submissions, which typically resulted in a full page reload.
== Historical Context ==
The foundational concept for XMLHttpRequest was first conceived in 2000 by the engineering team responsible for Microsoft Outlook. This concept was subsequently materialized in Internet Explorer 5 (1999). However, the initial implementation utilized different identifiers; developers employed ActiveXObject("Msxml2.XMLHTTP") and ActiveXObject("Microsoft.XMLHTTP"). As of Internet Explorer 7 (2006), universal browser support for the XMLHttpRequest identifier was achieved.
The XMLHttpRequest identifier has since become the universally accepted convention across all major browser rendering engines, including Mozilla's Gecko (2002), Safari 1.2 (2004), and Opera 8.0 (2005).
=== Standardization Process === The World Wide Web Consortium (W3C) released the initial Working Draft specification for the XMLHttpRequest object on April 5, 2006. A subsequent Level 2 Working Draft was published by the W3C on February 25, 2008. Level 2 introduced enhancements such as progress event monitoring, support for cross-site requests, and binary byte stream handling. By the close of 2011, the Level 2 feature set was merged back into the primary specification. Development transitioned to the WHATWG initiative at the end of 2012, which now maintains the living document using Web IDL specifications.
== Operational Procedure == Generally, executing a server request using XMLHttpRequest involves a sequence of defined programming steps.
Instantiate an XMLHttpRequest object by invoking its constructor: Invoke the "open" method to define the request method (e.g., GET, POST), specify the target resource URI, and select between synchronous or asynchronous execution flow: For asynchronous operations, establish an event handler function that will be triggered upon state transitions: Initiate the transfer by calling the "send" method, optionally including request body data: Process state changes within the registered event listener. Upon successful server response, the data resides in the "responseText" property when the object reaches state 4 (the "complete" state). Beyond these core steps, XMLHttpRequest offers extensive configuration options for request transmission and response handling. Custom request headers can be appended to guide server processing, and data can be uploaded via the argument passed to the "send" call. Responses can be deserialized from JSON into native JavaScript objects or streamed incrementally rather than waiting for full reception. Operations can be halted prematurely or configured with a timeout threshold.
== Inter-Domain Communication ==
During the nascent stages of the World Wide Web, limitations were observed regarding requests originating from one domain accessing resources on another domain, leading to security restrictions.
