logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

U-M-C-P-Interface

Interface for controlling the Unstructured platform's data orchestration layer, encompassing source ingestion, destination delivery, pipeline configuration, and job execution management. Facilitates enumeration, construction, modification, and removal of data connectors and workflow definitions for streamlined data handling.

Author

U-M-C-P-Interface logo

liuchongchong1995

No License

Quick Info

GitHub GitHub Stars 0
NPM Weekly Downloads 0
Tools 1
Last Updated 2026-02-19

Tags

apisapirequestsunstructured apiapis httpapi manage

Unstructured Management Control Plane Interface

This is an implementation of the MCP server designed specifically for interacting with the Unstructured API ecosystem. It furnishes utilities for querying data sources and workflow definitions.

Exposed Functionalities

Function Name Purpose
enumerate_sources Retrieves a manifest of accessible data ingestion endpoints.
fetch_source_details Obtains granular metadata for a nominated source connector.
provision_source_connector Establishes a new data source connector instance.
modify_source_connector Applies parameter adjustments to an extant source connector.
retire_source_connector Decommissions a source connector, identified by its unique ID.
enumerate_destinations Lists all configured data delivery targets.
fetch_destination_details Fetches comprehensive details regarding a specific destination.
provision_destination Initializes a new data output connector based on provided inputs.
modify_destination_config Updates the settings of an existing destination connector.
retire_destination_alias Removes a destination connector using its identifier.
enumerate_workflows Fetches the catalog of defined processing sequences.
fetch_workflow_blueprint Retrieves the complete schema for a designated workflow.
create_workflow_pipeline Defines a novel workflow incorporating source, destination, etc.
execute_workflow Triggers the processing sequence for a specified workflow ID.
update_workflow_spec Modifies the structure of an existing workflow definition.
deprecate_workflow Deactivates a workflow instance based on its identifier.
enumerate_execution_logs Lists operational records for jobs linked to a workflow.
fetch_job_execution_log Retrieves detailed telemetry for an individual job instance.
terminate_job Stops the execution of a specified job instance.

Refer to the following tables for the current set of supported data conduits within the U-M-C-P-Interface. Comprehensive documentation for all available ingestion conduits can be found here, and for delivery targets, consult here. Expansion of this list is forthcoming.

Input Conduit Output Conduit
S3 S3
Azure Weaviate
Google Drive Pinecone
OneDrive AstraDB
Salesforce MongoDB
Sharepoint Neo4j
Databricks Volumes
Databricks Volumes Delta Table

To utilize functions involved in the lifecycle management (creation/modification/deletion) of a data conduit, the requisite access credentials must be declared within your environment variables file (.env). The following table outlines the necessary environmental keys supported by this interface:

Variable Name Prerequisite Description
ANTHROPIC_API_KEY Mandatory for invoking the minimal_client to communicate with our backend server.
AWS_KEY, AWS_SECRET Required for configuring the S3 ingestion conduit via the U-M-C-P-Interface server. Refer to operational guides here and here for implementation details.
WEAVIATE_CLOUD_API_KEY Essential for instantiating the Weaviate vector database output module. Detailed setup instructions are available here.
FIRECRAWL_API_KEY Necessary for leveraging Firecrawl functionalities located in external/firecrawl.py. Obtain credentials by registering at Firecrawl.
ASTRA_DB_APPLICATION_TOKEN, ASTRA_DB_API_ENDPOINT Required for creating the AstraDB output module via the U-M-C-P-Interface server. See configuration steps documented here.
AZURE_CONNECTION_STRING Primary authentication method (Option 1) for provisioning the Azure blob storage input conduit via the U-M-C-P-Interface server. Consult documentation for guidance.
AZURE_ACCOUNT_NAME+AZURE_ACCOUNT_KEY Alternative authentication method (Option 2) for provisioning the Azure input conduit. Refer to instructions in the documentation.
AZURE_ACCOUNT_NAME+AZURE_SAS_TOKEN Tertiary authentication method (Option 3) for provisioning the Azure input conduit. See the specified documentation for details.
NEO4J_PASSWORD Required to establish the Neo4j output conduit via the U-M-C-P-Interface server. Configuration steps are provided here.
MONGO_DB_CONNECTION_STRING Necessary for instantiating the MongoDB output connector via the U-M-C-P-Interface server. Setup guide available here.
GOOGLEDRIVE_SERVICE_ACCOUNT_KEY A base64-encoded string. The original configuration key (JSON file) obtained from the documentation must be converted using the command base64 < /path/to/google_service_account_key.json in your terminal.
DATABRICKS_CLIENT_ID,DATABRICKS_CLIENT_SECRET Required to configure Databricks volume or delta table output modules through the U-M-C-P-Interface server. See specifications here and here.
ONEDRIVE_CLIENT_ID, ONEDRIVE_CLIENT_CRED,ONEDRIVE_TENANT_ID Necessary for provisioning the OneDrive synchronization module via the U-M-C-P-Interface server. Instructions are documented here.
PINECONE_API_KEY Required to provision the Pinecone vector database output module via the U-M-C-P-Interface server. Configuration details are present here.
SALESFORCE_CONSUMER_KEY,SALESFORCE_PRIVATE_KEY Required for establishing the Salesforce input conduit using the U-M-C-P-Interface server. Consult the guide https://docs.unstructured.io/ingestion/source-connectors/salesforce.
SHAREPOINT_CLIENT_ID, SHAREPOINT_CLIENT_CRED,SHAREPOINT_TENANT_ID Necessary for provisioning the SharePoint input synchronization module via the U-M-C-P-Interface server. Refer to the configuration instructions here.
LOG_LEVEL Controls the verbosity of logging output from our minimal_client; set to 'ERROR' to minimize extraneous console messages.
CONFIRM_TOOL_USE Setting this to 'true' mandates that the minimal_client seeks explicit confirmation prior to executing any tool invocation.
DEBUG_API_REQUESTS Setting this to 'true' instructs uns_mcp/server.py to emit the parameters of outbound requests for enhanced diagnostic tracing.

Firecrawl Integration

Firecrawl functions as an external web traversal API offering two primary capabilities within our MCP framework:

  1. HTML Content Acquisition: Employing invoke_firecrawl_crawlhtml to initiate traversal tasks and check_crawlhtml_status to monitor their progression.
  2. LLM-Optimized Text Generation: Utilizing invoke_firecrawl_llmtxt to generate model-ready text and check_llmtxt_status to retrieve the resulting data.

Traversal Sequence Outline (Web Crawling): - Initiation begins with a specified starting URL, followed by link identification. - Prioritizes sitemap discovery; otherwise, it follows internal links found on the site. - Systematically explores linked pages to map the entire site structure. - Accumulates content from every visited page, managing dynamic content rendering and server request throttling. - Traversal jobs can be halted mid-process using cancel_crawlhtml_job. - Recommended for scenarios demanding raw HTML output, which Unstructured's subsequent processing stages refine effectively :smile:.

Text Optimization Sequence (LLM Text Generation): - Post-traversal, the system extracts semantically rich, clean text. - Formats this text into structures highly amenable to large language models. - Output is automatically persisted to the designated S3 location. - Crucial Note: Text generation operations are irreversible once started; cancel_llmtxt_job is present for API conformity but lacks backend support from Firecrawl currently.

Requirement: The environment variable FIRECRAWL_API_KEY must be established to access these functions.

Setup and Configuration Guide

This section details the necessary procedures for initializing and configuring the U-M-C-P-Interface server, assuming a base environment of Python 3.12 and the uv package manager.

Prerequisites

Method 1: Utilizing uv (Preferred)

No manual installation steps are typically needed when using uvx as it manages execution context. If direct package installation is preferred: bash uv pip install uns_mcp

Configuring for Claude Desktop

Inject the following JSON structure into your claude_desktop_config.json file:

File Location Note: This file is commonly found in the ~/Library/Application Support/Claude/ directory.

Using the uvx Executor:

{ "mcpServers": { "U-M-C-P-Interface": { "command": "uvx", "args": ["uns_mcp"], "env": { "UNSTRUCTURED_API_KEY": "" } } } }

Alternatively, Using the Python Package Invocation:

{ "mcpServers": { "U-M-C-P-Interface": { "command": "python", "args": ["-m", "uns_mcp"], "env": { "UNSTRUCTURED_API_KEY": "" } } } }

Method 2: Source Code Checkout

  1. Obtain the source code repository copy.

  2. Install required dependencies: bash uv sync

  3. Define your Unstructured API key via an environment variable. Create a .env file in the project root containing: bash UNSTRUCTURED_API_KEY="YOUR_KEY"

    Consult .env.template for a complete list of adjustable environmental parameters.

You can initiate the server using one of these deployment strategies:

Via Editable Package Installation Install the package in an editable mode: bash uvx pip install -e . Update your Claude Desktop configuration: { "mcpServers": { "U-M-C-P-Interface": { "command": "uvx", "args": ["uns_mcp"] } } } **Important:** Ensure that the configuration points to the correct `uvx` executable in the environment where the package resides.
Using the Server-Sent Events (SSE) Protocol **Note: This protocol is incompatible with the Claude Desktop environment.** For easier debugging, the client and server components can be run independently using the SSE protocol: 1. Launch the server process in one terminal instance: bash uv run python uns_mcp/server.py --host 127.0.0.1 --port 8080 # Alternatively: make sse-server 2. Test connectivity using a local client script in a separate terminal: bash uv run python minimal_client/client.py "http://127.0.0.1:8080/sse" # Alternatively: make sse-client **Shutdown Sequence:** Terminate the client process first using `Ctrl+C`, followed by the server process.
Using the Standard Input/Output (Stdio) Protocol Configure the Claude Desktop environment to use the Stdio mechanism: { "mcpServers": { "U-M-C-P-Interface": { "command": "ABSOLUTE/PATH/TO/.local/bin/uv", "args": [ "--directory", "ABSOLUTE/PATH/TO/YOUR-UNS-MCP-REPO/uns_mcp", "run", "server.py" ] } } } Alternatively, execute the local client directly: bash uv run python minimal_client/client.py uns_mcp/server.py

Supplemental Local Client Environmental Configuration

Environment variables can customize the minimal client's behavior: - LOG_LEVEL="ERROR": Suppresses informational logging from the LLM, showing only critical user messages. - CONFIRM_TOOL_USE='false': Disables the pre-execution confirmation prompt for tool usage. Exercise extreme caution with this setting, especially during development, as it permits the LLM to execute potentially costly operations or irreversible data modifications.

Diagnostic Utilities

Anthropic supplies the MCP Inspector utility to facilitate debugging and testing of your MCP server. Execute the following command to launch a debugging interface. Within this UI, you can populate environment variables (pointing to your local setup) in the left panel. Once configured, navigate to the tools section to test the functionalities exposed by this MCP interface.

mcp dev uns_mcp/server.py

To enable logging of the parameter sets sent to UnstructuredClient functions, set the environment variable DEBUG_API_REQUESTS=false. Logs are systematically archived with filenames following the pattern unstructured-client-{date}.log, providing a record for inspecting outbound request parameters.

Enabling Terminal Access via Minimal Client

We integrate @wonderwhy-er/desktop-commander to inject terminal command execution capabilities into the minimal client. This relies on the MCP Filesystem Server architecture. Warning: This grants the client (and thus the LLM) full access to private filesystem resources.

Execute this command to install the necessary package: bash npx @wonderwhy-er/desktop-commander setup

Then, initiate the client process with the specialized argument:

bash uv run python minimal_client/client.py "http://127.0.0.1:8080/sse" "@wonderwhy-er/desktop-commander"

Or, using the shortcut:

make sse-client-terminal

Constraint: Using a Limited Tool Subset

Should your client environment only support a subset of the listed functions, be aware of the following dependency: - The update_workflow_spec function must always be present in the operational context alongside create_workflow_pipeline, as the former relies on the latter's detailed configuration description for context on custom node setup.

Known Operational Constraints

  • update_workflow_spec: This function operates via full configuration replacement, not incremental patching. Therefore, the complete current configuration of the target workflow must be supplied, either explicitly by the user or implicitly via a preceding call to fetch_workflow_blueprint.

Version History Log (CHANGELOG.md)

All novel features, fixes, and enhancements will be documented sequentially in CHANGELOG.md. Pre-stable releases should adhere to the 0.x.x-dev format before major version increments.

Troubleshooting

  • If an Error: spawn <command> ENOENT error materializes, it signals that <command> is either not installed or not discoverable within the system's PATH environment variable:
  • Verify the installation status and PATH configuration.
  • Alternatively, specify the full, absolute path to the executable within the command field of your configuration structure. For instance, replace a generic python entry with /opt/miniconda3/bin/python.

INFORMATIVE NOTE: XMLHttpRequest (XHR) is an Application Programming Interface implemented as a JavaScript object designed to ferry HTTP requests from a web browser to a server. Its methods enable browser-based applications to communicate with the server subsequent to initial page load and retrieve asynchronous responses. XHR forms a core element of Ajax development practices. Preceding Ajax, server interaction relied primarily on standard hyperlink navigations and form submissions, which typically resulted in a full page reload.

== Historical Context == The foundational concept for XMLHttpRequest was first conceived in 2000 by the engineering team responsible for Microsoft Outlook. This concept was subsequently materialized in Internet Explorer 5 (1999). However, the initial implementation utilized different identifiers; developers employed ActiveXObject("Msxml2.XMLHTTP") and ActiveXObject("Microsoft.XMLHTTP"). As of Internet Explorer 7 (2006), universal browser support for the XMLHttpRequest identifier was achieved. The XMLHttpRequest identifier has since become the universally accepted convention across all major browser rendering engines, including Mozilla's Gecko (2002), Safari 1.2 (2004), and Opera 8.0 (2005).

=== Standardization Process === The World Wide Web Consortium (W3C) released the initial Working Draft specification for the XMLHttpRequest object on April 5, 2006. A subsequent Level 2 Working Draft was published by the W3C on February 25, 2008. Level 2 introduced enhancements such as progress event monitoring, support for cross-site requests, and binary byte stream handling. By the close of 2011, the Level 2 feature set was merged back into the primary specification. Development transitioned to the WHATWG initiative at the end of 2012, which now maintains the living document using Web IDL specifications.

== Operational Procedure == Generally, executing a server request using XMLHttpRequest involves a sequence of defined programming steps.

Instantiate an XMLHttpRequest object by invoking its constructor: Invoke the "open" method to define the request method (e.g., GET, POST), specify the target resource URI, and select between synchronous or asynchronous execution flow: For asynchronous operations, establish an event handler function that will be triggered upon state transitions: Initiate the transfer by calling the "send" method, optionally including request body data: Process state changes within the registered event listener. Upon successful server response, the data resides in the "responseText" property when the object reaches state 4 (the "complete" state). Beyond these core steps, XMLHttpRequest offers extensive configuration options for request transmission and response handling. Custom request headers can be appended to guide server processing, and data can be uploaded via the argument passed to the "send" call. Responses can be deserialized from JSON into native JavaScript objects or streamed incrementally rather than waiting for full reception. Operations can be halted prematurely or configured with a timeout threshold.

== Inter-Domain Communication ==

During the nascent stages of the World Wide Web, limitations were observed regarding requests originating from one domain accessing resources on another domain, leading to security restrictions.

See Also

`