Unstructured Management Control Plane Interface

This is an implementation of the MCP server designed specifically for interacting with the Unstructured API ecosystem. It furnishes utilities for querying data sources and workflow definitions.

Exposed Functionalities

Function Name	Purpose
`enumerate_sources`	Retrieves a manifest of accessible data ingestion endpoints.
`fetch_source_details`	Obtains granular metadata for a nominated source connector.
`provision_source_connector`	Establishes a new data source connector instance.
`modify_source_connector`	Applies parameter adjustments to an extant source connector.
`retire_source_connector`	Decommissions a source connector, identified by its unique ID.
`enumerate_destinations`	Lists all configured data delivery targets.
`fetch_destination_details`	Fetches comprehensive details regarding a specific destination.
`provision_destination`	Initializes a new data output connector based on provided inputs.
`modify_destination_config`	Updates the settings of an existing destination connector.
`retire_destination_alias`	Removes a destination connector using its identifier.
`enumerate_workflows`	Fetches the catalog of defined processing sequences.
`fetch_workflow_blueprint`	Retrieves the complete schema for a designated workflow.
`create_workflow_pipeline`	Defines a novel workflow incorporating source, destination, etc.
`execute_workflow`	Triggers the processing sequence for a specified workflow ID.
`update_workflow_spec`	Modifies the structure of an existing workflow definition.
`deprecate_workflow`	Deactivates a workflow instance based on its identifier.
`enumerate_execution_logs`	Lists operational records for jobs linked to a workflow.
`fetch_job_execution_log`	Retrieves detailed telemetry for an individual job instance.
`terminate_job`	Stops the execution of a specified job instance.

Refer to the following tables for the current set of supported data conduits within the U-M-C-P-Interface. Comprehensive documentation for all available ingestion conduits can be found here, and for delivery targets, consult here. Expansion of this list is forthcoming.

Input Conduit	Output Conduit
S3	S3
Azure	Weaviate
Google Drive	Pinecone
OneDrive	AstraDB
Salesforce	MongoDB
Sharepoint	Neo4j
	Databricks Volumes
	Databricks Volumes Delta Table

To utilize functions involved in the lifecycle management (creation/modification/deletion) of a data conduit, the requisite access credentials must be declared within your environment variables file (.env). The following table outlines the necessary environmental keys supported by this interface:

Variable Name	Prerequisite Description
`ANTHROPIC_API_KEY`	Mandatory for invoking the `minimal_client` to communicate with our backend server.
`AWS_KEY`, `AWS_SECRET`	Required for configuring the S3 ingestion conduit via the `U-M-C-P-Interface` server. Refer to operational guides here and here for implementation details.
`WEAVIATE_CLOUD_API_KEY`	Essential for instantiating the Weaviate vector database output module. Detailed setup instructions are available here.
`FIRECRAWL_API_KEY`	Necessary for leveraging Firecrawl functionalities located in `external/firecrawl.py`. Obtain credentials by registering at Firecrawl.
`ASTRA_DB_APPLICATION_TOKEN`, `ASTRA_DB_API_ENDPOINT`	Required for creating the AstraDB output module via the `U-M-C-P-Interface` server. See configuration steps documented here.
`AZURE_CONNECTION_STRING`	Primary authentication method (Option 1) for provisioning the Azure blob storage input conduit via the `U-M-C-P-Interface` server. Consult documentation for guidance.
`AZURE_ACCOUNT_NAME`+`AZURE_ACCOUNT_KEY`	Alternative authentication method (Option 2) for provisioning the Azure input conduit. Refer to instructions in the documentation.
`AZURE_ACCOUNT_NAME`+`AZURE_SAS_TOKEN`	Tertiary authentication method (Option 3) for provisioning the Azure input conduit. See the specified documentation for details.
`NEO4J_PASSWORD`	Required to establish the Neo4j output conduit via the `U-M-C-P-Interface` server. Configuration steps are provided here.
`MONGO_DB_CONNECTION_STRING`	Necessary for instantiating the MongoDB output connector via the `U-M-C-P-Interface` server. Setup guide available here.
`GOOGLEDRIVE_SERVICE_ACCOUNT_KEY`	A base64-encoded string. The original configuration key (JSON file) obtained from the documentation must be converted using the command `base64 < /path/to/google_service_account_key.json` in your terminal.
`DATABRICKS_CLIENT_ID`,`DATABRICKS_CLIENT_SECRET`	Required to configure Databricks volume or delta table output modules through the `U-M-C-P-Interface` server. See specifications here and here.
`ONEDRIVE_CLIENT_ID`, `ONEDRIVE_CLIENT_CRED`,`ONEDRIVE_TENANT_ID`	Necessary for provisioning the OneDrive synchronization module via the `U-M-C-P-Interface` server. Instructions are documented here.
`PINECONE_API_KEY`	Required to provision the Pinecone vector database output module via the `U-M-C-P-Interface` server. Configuration details are present here.
`SALESFORCE_CONSUMER_KEY`,`SALESFORCE_PRIVATE_KEY`	Required for establishing the Salesforce input conduit using the `U-M-C-P-Interface` server. Consult the guide https://docs.unstructured.io/ingestion/source-connectors/salesforce.
`SHAREPOINT_CLIENT_ID`, `SHAREPOINT_CLIENT_CRED`,`SHAREPOINT_TENANT_ID`	Necessary for provisioning the SharePoint input synchronization module via the `U-M-C-P-Interface` server. Refer to the configuration instructions here.
`LOG_LEVEL`	Controls the verbosity of logging output from our `minimal_client`; set to 'ERROR' to minimize extraneous console messages.
`CONFIRM_TOOL_USE`	Setting this to 'true' mandates that the `minimal_client` seeks explicit confirmation prior to executing any tool invocation.
`DEBUG_API_REQUESTS`	Setting this to 'true' instructs `uns_mcp/server.py` to emit the parameters of outbound requests for enhanced diagnostic tracing.

Firecrawl Integration

Firecrawl functions as an external web traversal API offering two primary capabilities within our MCP framework:

HTML Content Acquisition: Employing invoke_firecrawl_crawlhtml to initiate traversal tasks and check_crawlhtml_status to monitor their progression.
LLM-Optimized Text Generation: Utilizing invoke_firecrawl_llmtxt to generate model-ready text and check_llmtxt_status to retrieve the resulting data.

Traversal Sequence Outline (Web Crawling): - Initiation begins with a specified starting URL, followed by link identification. - Prioritizes sitemap discovery; otherwise, it follows internal links found on the site. - Systematically explores linked pages to map the entire site structure. - Accumulates content from every visited page, managing dynamic content rendering and server request throttling. - Traversal jobs can be halted mid-process using cancel_crawlhtml_job. - Recommended for scenarios demanding raw HTML output, which Unstructured's subsequent processing stages refine effectively :smile:.

Text Optimization Sequence (LLM Text Generation): - Post-traversal, the system extracts semantically rich, clean text. - Formats this text into structures highly amenable to large language models. - Output is automatically persisted to the designated S3 location. - Crucial Note: Text generation operations are irreversible once started; cancel_llmtxt_job is present for API conformity but lacks backend support from Firecrawl currently.

Requirement: The environment variable FIRECRAWL_API_KEY must be established to access these functions.

Setup and Configuration Guide

This section details the necessary procedures for initializing and configuring the U-M-C-P-Interface server, assuming a base environment of Python 3.12 and the uv package manager.

Prerequisites

Python version 3.12 or newer.
The uv environment management tool.
An active API key from the Unstructured platform, obtainable by registering at https://platform.unstructured.io/app/account/api-keys.

Method 1: Utilizing `uv` (Preferred)

No manual installation steps are typically needed when using uvx as it manages execution context. If direct package installation is preferred: bash uv pip install uns_mcp

Configuring for Claude Desktop

Inject the following JSON structure into your claude_desktop_config.json file:

File Location Note: This file is commonly found in the ~/Library/Application Support/Claude/ directory.

Using the uvx Executor:

{ "mcpServers": { "U-M-C-P-Interface": { "command": "uvx", "args": ["uns_mcp"], "env": { "UNSTRUCTURED_API_KEY": "" } } } }

Alternatively, Using the Python Package Invocation:

{ "mcpServers": { "U-M-C-P-Interface": { "command": "python", "args": ["-m", "uns_mcp"], "env": { "UNSTRUCTURED_API_KEY": "" } } } }

Method 2: Source Code Checkout

Obtain the source code repository copy.
Install required dependencies: bash uv sync
Define your Unstructured API key via an environment variable. Create a .env file in the project root containing: bash UNSTRUCTURED_API_KEY="YOUR_KEY"

Consult .env.template for a complete list of adjustable environmental parameters.

You can initiate the server using one of these deployment strategies:

Via Editable Package Installation

Install the package in an editable mode: bash uvx pip install -e . Update your Claude Desktop configuration: { "mcpServers": { "U-M-C-P-Interface": { "command": "uvx", "args": ["uns_mcp"] } } } **Important:** Ensure that the configuration points to the correct `uvx` executable in the environment where the package resides.

Using the Server-Sent Events (SSE) Protocol

**Note: This protocol is incompatible with the Claude Desktop environment.** For easier debugging, the client and server components can be run independently using the SSE protocol: 1. Launch the server process in one terminal instance: bash uv run python uns_mcp/server.py --host 127.0.0.1 --port 8080 # Alternatively: make sse-server 2. Test connectivity using a local client script in a separate terminal: bash uv run python minimal_client/client.py "http://127.0.0.1:8080/sse" # Alternatively: make sse-client **Shutdown Sequence:** Terminate the client process first using `Ctrl+C`, followed by the server process.

Using the Standard Input/Output (Stdio) Protocol

Configure the Claude Desktop environment to use the Stdio mechanism: { "mcpServers": { "U-M-C-P-Interface": { "command": "ABSOLUTE/PATH/TO/.local/bin/uv", "args": [ "--directory", "ABSOLUTE/PATH/TO/YOUR-UNS-MCP-REPO/uns_mcp", "run", "server.py" ] } } } Alternatively, execute the local client directly: bash uv run python minimal_client/client.py uns_mcp/server.py

Supplemental Local Client Environmental Configuration

Environment variables can customize the minimal client's behavior: - LOG_LEVEL="ERROR": Suppresses informational logging from the LLM, showing only critical user messages. - CONFIRM_TOOL_USE='false': Disables the pre-execution confirmation prompt for tool usage. Exercise extreme caution with this setting, especially during development, as it permits the LLM to execute potentially costly operations or irreversible data modifications.

Diagnostic Utilities

Anthropic supplies the MCP Inspector utility to facilitate debugging and testing of your MCP server. Execute the following command to launch a debugging interface. Within this UI, you can populate environment variables (pointing to your local setup) in the left panel. Once configured, navigate to the tools section to test the functionalities exposed by this MCP interface.

mcp dev uns_mcp/server.py

To enable logging of the parameter sets sent to UnstructuredClient functions, set the environment variable DEBUG_API_REQUESTS=false. Logs are systematically archived with filenames following the pattern unstructured-client-{date}.log, providing a record for inspecting outbound request parameters.

Enabling Terminal Access via Minimal Client

We integrate @wonderwhy-er/desktop-commander to inject terminal command execution capabilities into the minimal client. This relies on the MCP Filesystem Server architecture. Warning: This grants the client (and thus the LLM) full access to private filesystem resources.

Execute this command to install the necessary package: bash npx @wonderwhy-er/desktop-commander setup

Then, initiate the client process with the specialized argument:

bash uv run python minimal_client/client.py "http://127.0.0.1:8080/sse" "@wonderwhy-er/desktop-commander"

Or, using the shortcut:

make sse-client-terminal

Constraint: Using a Limited Tool Subset

Should your client environment only support a subset of the listed functions, be aware of the following dependency: - The update_workflow_spec function must always be present in the operational context alongside create_workflow_pipeline, as the former relies on the latter's detailed configuration description for context on custom node setup.

Known Operational Constraints

update_workflow_spec: This function operates via full configuration replacement, not incremental patching. Therefore, the complete current configuration of the target workflow must be supplied, either explicitly by the user or implicitly via a preceding call to fetch_workflow_blueprint.

Version History Log (`CHANGELOG.md`)

All novel features, fixes, and enhancements will be documented sequentially in CHANGELOG.md. Pre-stable releases should adhere to the 0.x.x-dev format before major version increments.

Troubleshooting

If an Error: spawn <command> ENOENT error materializes, it signals that <command> is either not installed or not discoverable within the system's PATH environment variable:
Verify the installation status and PATH configuration.
Alternatively, specify the full, absolute path to the executable within the command field of your configuration structure. For instance, replace a generic python entry with /opt/miniconda3/bin/python.

INFORMATIVE NOTE: XMLHttpRequest (XHR) is an Application Programming Interface implemented as a JavaScript object designed to ferry HTTP requests from a web browser to a server. Its methods enable browser-based applications to communicate with the server subsequent to initial page load and retrieve asynchronous responses. XHR forms a core element of Ajax development practices. Preceding Ajax, server interaction relied primarily on standard hyperlink navigations and form submissions, which typically resulted in a full page reload.

== Historical Context == The foundational concept for XMLHttpRequest was first conceived in 2000 by the engineering team responsible for Microsoft Outlook. This concept was subsequently materialized in Internet Explorer 5 (1999). However, the initial implementation utilized different identifiers; developers employed ActiveXObject("Msxml2.XMLHTTP") and ActiveXObject("Microsoft.XMLHTTP"). As of Internet Explorer 7 (2006), universal browser support for the XMLHttpRequest identifier was achieved. The XMLHttpRequest identifier has since become the universally accepted convention across all major browser rendering engines, including Mozilla's Gecko (2002), Safari 1.2 (2004), and Opera 8.0 (2005).

=== Standardization Process === The World Wide Web Consortium (W3C) released the initial Working Draft specification for the XMLHttpRequest object on April 5, 2006. A subsequent Level 2 Working Draft was published by the W3C on February 25, 2008. Level 2 introduced enhancements such as progress event monitoring, support for cross-site requests, and binary byte stream handling. By the close of 2011, the Level 2 feature set was merged back into the primary specification. Development transitioned to the WHATWG initiative at the end of 2012, which now maintains the living document using Web IDL specifications.

== Operational Procedure == Generally, executing a server request using XMLHttpRequest involves a sequence of defined programming steps.

Instantiate an XMLHttpRequest object by invoking its constructor: Invoke the "open" method to define the request method (e.g., GET, POST), specify the target resource URI, and select between synchronous or asynchronous execution flow: For asynchronous operations, establish an event handler function that will be triggered upon state transitions: Initiate the transfer by calling the "send" method, optionally including request body data: Process state changes within the registered event listener. Upon successful server response, the data resides in the "responseText" property when the object reaches state 4 (the "complete" state). Beyond these core steps, XMLHttpRequest offers extensive configuration options for request transmission and response handling. Custom request headers can be appended to guide server processing, and data can be uploaded via the argument passed to the "send" call. Responses can be deserialized from JSON into native JavaScript objects or streamed incrementally rather than waiting for full reception. Operations can be halted prematurely or configured with a timeout threshold.

== Inter-Domain Communication ==

During the nascent stages of the World Wide Web, limitations were observed regarding requests originating from one domain accessing resources on another domain, leading to security restrictions.

U-M-C-P-Interface

Author

liuchongchong1995

Quick Info

Actions

Tags

Unstructured Management Control Plane Interface