logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

UNS-MCP-Interface

A gateway for orchestrating operations with the Unstructured Data Processing Platform's core API. Facilitates the configuration of input sources, output destinations, and execution pipelines. Includes utilities for inventorying available data ingress points and querying specific connector configurations.

Author

UNS-MCP-Interface logo

Unstructured-IO

No License

Quick Info

GitHub GitHub Stars 35
NPM Weekly Downloads 348
Tools 1
Last Updated 2026-02-19

Tags

unstructuredioapiunstructured apiunstructured iotools unstructured

Unstructured Platform Management Control Point (MCP) Agent

This agent serves as the primary interface for managing the lifecycle of data integration assets within the Unstructured ecosystem. It enables users to systematically manage data origins, delivery targets, and processing flows.

Exposed Management Utilities

Utility Name Functionality Synopsis
list_sources Enumerates all configured data ingress endpoints accessible via the Unstructured API.
get_source_info Retrieves granular configuration specifics for a designated input connector.
create_source_connector Provisions a new data source integration based on provided parameters.
update_source_connector Modifies the settings of an existing source integration.
delete_source_connector Decommissions a source connector, referenced by its unique identifier.
list_destinations Presents a catalog of configured data output targets managed by the Unstructured API.
get_destination_info Fetches comprehensive details for a specified output endpoint connector.
create_destination_connector Establishes a new data delivery target definition.
update_destination_connector Adjusts parameters for an existing output destination integration.
delete_destination_connector Removes a defined destination connector, using its ID for reference.
list_workflows Returns a ledger of defined automated processing sequences (workflows).
get_workflow_info Obtains exhaustive structural details for an individual workflow definition.
create_workflow Constructs a novel processing pipeline, linking sources, destinations, etc.
run_workflow Initiates the execution of a specific workflow, identified by its ID.
update_workflow Replaces the configuration of an extant workflow definition.
delete_workflow Eradicates a designated workflow pipeline by its identifier.
list_jobs Lists all processing instances associated with a particular workflow execution history.
get_job_info Retrieves detailed status and metadata for a singular processing job ID.
cancel_job Terminates an actively running or pending job instance.
list_workflows_with_finished_jobs Generates a report of workflows containing at least one completed job, including related source/destination metadata.

Below is a curated selection of integrated data integration mechanisms currently supported by the UNS-MCP-Interface. Comprehensive documentation for all supported source connectors can be found here, and for destinations, consult here. Expansion of this catalog is an ongoing effort.

Supported Ingress Points Supported Egress Points
S3 Storage S3 Storage
Azure Blob Storage Weaviate Vector Database
Google Drive Pinecone Vector Database
Microsoft OneDrive AstraDB
Salesforce MongoDB
SharePoint Neo4j Graph Database
Databricks Volumes
Databricks Delta Table

To successfully execute utilities for creating, modifying, or removing connectors, the requisite authentication material for that specific connector must be present within your local environment configuration file (.env). The following table enumerates the essential environmental variables tied to supported connectors:

Credential Name Configuration Requirement Summary
ANTHROPIC_API_KEY Mandatory for the minimal_client to interface with our server infrastructure.
AWS_KEY, AWS_SECRET Necessary for provisioning S3 source/destination connectors via the uns-mcp agent. Guidance available here and here.
WEAVIATE_CLOUD_API_KEY Required when setting up the Weaviate vector database delivery endpoint. Instructions here.
FIRECRAWL_API_KEY Needed to utilize Firecrawl processing routines detailed in external/firecrawl.py. Obtain a key from Firecrawl.
ASTRA_DB_APPLICATION_TOKEN, ASTRA_DB_API_ENDPOINT Required for configuring the AstraDB connector. Steps outlined in the documentation here.
AZURE_CONNECTION_STRING Primary authentication mechanism (Option 1) for creating Azure Blob Storage ingress points. See setup details here.
AZURE_ACCOUNT_NAME+AZURE_ACCOUNT_KEY Secondary authentication mechanism (Option 2) for Azure Blob Storage ingress configuration. Refer to documentation.
AZURE_ACCOUNT_NAME+AZURE_SAS_TOKEN Tertiary authentication method (Option 3) for Azure Blob Storage ingress setup. Details available here.
NEO4J_PASSWORD Essential for establishing a connection to the Neo4j graph database as an output target. See configuration guide here.
MONGO_DB_CONNECTION_STRING Prerequisite for setting up the MongoDB egress mechanism. Configuration guide here.
GOOGLEDRIVE_SERVICE_ACCOUNT_KEY A base64 encoded string derived from your Google Service Account JSON key file. Conversion command: base64 < /path/to/google_service_account_key.json. Follow setup instructions here.
DATABRICKS_CLIENT_ID,DATABRICKS_CLIENT_SECRET Required for provisioning Databricks Volume or Delta Table output connectors. Refer to guides here and here.
ONEDRIVE_CLIENT_ID, ONEDRIVE_CLIENT_CRED,ONEDRIVE_TENANT_ID Necessary credentials for configuring Microsoft OneDrive as a data ingress source. Instructions here.
PINECONE_API_KEY Required to provision the Pinecone vector database output handler. Setup guide here.
SALESFORCE_CONSUMER_KEY,SALESFORCE_PRIVATE_KEY Required for connecting to Salesforce as a data source. Details in the documentation here.
SHAREPOINT_CLIENT_ID, SHAREPOINT_CLIENT_CRED,SHAREPOINT_TENANT_ID Necessary credentials for integrating with SharePoint as an input source. Setup instructions here.
LOG_LEVEL Controls the verbosity of the minimal_client runtime output; e.g., setting to ERROR minimizes extraneous console messages.
CONFIRM_TOOL_USE If set to true, the minimal_client will prompt for explicit confirmation before executing any tool call.
DEBUG_API_REQUESTS Setting this to true enables verbose logging of request payloads within uns_mcp/server.py for enhanced debugging insight.

Specialized Firecrawl Integration

Firecrawl extends the MCP's capabilities with two distinct web interaction methods:

  1. Raw HTML Retrieval: Utilize invoke_firecrawl_crawlhtml to initiate a crawl job, monitored by check_crawlhtml_status.
  2. LLM-Optimized Text Generation: Employ invoke_firecrawl_llmtxt to generate model-ready text, retrieving outcomes via check_llmtxt_status.

Firecrawl Operation Flow:

Web Scraping Sequence: - Commences at a specified Uniform Resource Locator (URL) and analyzes outbound links. - Prioritizes utilizing a sitemap if one is present; otherwise, it navigates discovered internal links. - Systematically explores linked pages recursively to map the entire site structure. - Aggregates content from every visited page, managing JavaScript rendering and network rate limits. - Jobs can be halted mid-execution using cancel_crawlhtml_job. - Recommended when the requirement is for the complete, raw HTML extraction, which Unstructured's processing engine excels at refining. :smile:

Text Optimization Sequence: - Post-crawling, it refines the gathered data into clean, semantically rich textual content. - Formats this content specifically for optimal consumption by Large Language Models (LLMs). - Output artifacts are automatically deposited into a designated S3 repository. - Critical Note: Text generation tasks cannot be halted once initiated. While cancel_llmtxt_job is provided for interface consistency, the underlying Firecrawl API currently does not support cancellation for this operation type.

Prerequisite: The FIRECRAWL_API_KEY environment variable must be configured to enable these functionalities.

Setup and Environment Provisioning

This section outlines the necessary procedures to initialize and configure the UNS_MCP agent, favoring Python 3.12 and the uv package manager.

Essential Requirements

  • Python Version 3.12 or newer.
  • The uv utility for dependency and environment handling.
  • An active API credential from the Unstructured Platform. You can secure yours by registering here.

No separate installation is needed when invoking via uvx as it manages execution context. For direct package installation: bash uv pip install uns_mcp

Claude Desktop Integration (via uvx)

For seamless integration with the Claude Desktop application, append the following configuration structure to your claude_desktop_config.json file.

Configuration Target: The configuration file is typically situated in ~/Library/Application Support/Claude/.

Configuration using uvx Command:

{ "mcpServers": { "UNS_MCP": { "command": "uvx", "args": ["uns_mcp"], "env": { "UNSTRUCTURED_API_KEY": "" } } } }

Configuration using Python Package Invocation:

{ "mcpServers": { "UNS_MCP": { "command": "python", "args": ["-m", "uns_mcp"], "env": { "UNSTRUCTURED_API_KEY": "" } } } }

Direct Source Code Checkout Method

  1. Obtain a local copy of the repository.

  2. Install project dependencies: bash uv sync

  3. Establish your Unstructured API key via an environment variable. Create a .env file in the repository root containing: bash UNSTRUCTURED_API_KEY="YOUR_KEY"

    Consult .env.template for a complete list of modifiable environmental settings.

You may now launch the server using one of the subsequent deployment strategies:

Editable Package Installation Method Install the package in editable mode: bash uvx pip install -e . Update the Claude Desktop configuration as follows: { "mcpServers": { "UNS_MCP": { "command": "uvx", "args": ["uns_mcp"] } } } **Crucial Note**: Ensure the configuration correctly points to the `uvx` executable within the environment where the package was installed.
Server-Sent Events (SSE) Protocol Operation **Limitation:** This protocol method is incompatible with Claude Desktop. For more straightforward debugging isolation, the client and server components can be run independently: 1. Initiate the server process in one terminal instance: bash uv run python uns_mcp/server.py --host 127.0.0.1 --port 8080 # Alternatively, use the convenience command: make sse-server 2. Test connectivity using a local client instance in a separate terminal: bash uv run python minimal_client/client.py "http://127.0.0.1:8080/sse" # Alternatively, use the convenience command: make sse-client **Shutdown Sequence:** Terminate the services by applying `Ctrl+C` first to the client process, followed by the server process.
Standard I/O (Stdio) Server Protocol Configure the Claude Desktop client to utilize the Stdio protocol: { "mcpServers": { "UNS_MCP": { "command": "ABSOLUTE/PATH/TO/.local/bin/uv", "args": [ "--directory", "ABSOLUTE/PATH/TO/YOUR-UNS-MCP-REPO/uns_mcp", "run", "server.py" ] } } } Alternatively, execute the local client directly referencing the server script: bash uv run python minimal_client/client.py uns_mcp/server.py

Supplementary Minimal Client Configuration

Control client behavior via environmental parameters: - LOG_LEVEL="ERROR": Reduces console verbosity by suppressing detailed operational logs, focusing output on user-relevant messages. - CONFIRM_TOOL_USE='false': Disables the mandatory confirmation step before tool execution. Exercise extreme caution with this setting, especially during testing, as it permits the LLM to potentially initiate costly operations or data destruction actions without explicit final authorization.

Debugging Utilities

Anthropic provides the MCP Inspector utility for interactive debugging and testing of your MCP agent. Execute the following command to launch the debugging interface. Within this UI, you can map local environment variables (including your secret keys) in the left panel, and then proceed to the tools section to exercise the agent's functionalities.

mcp dev uns_mcp/server.py

If you wish to log the precise parameters transmitted to the UnstructuredClient methods, set the environmental variable DEBUG_API_REQUESTS=false. Logs documenting these request parameters are saved daily, prefixed with unstructured-client-, allowing for detailed post-mortem analysis.

Integrating Terminal Access with the Minimal Client

We leverage @wonderwhy-er/desktop-commander, which is built atop the MCP Filesystem Server, to grant the minimal client command-line access. Be aware that this grants the client (and thus the controlling LLM) read/write access to sensitive local files.

Install the required package using: bash npx @wonderwhy-er/desktop-commander setup

Then, launch the client specifying the extra communication channel parameter:

bash uv run python minimal_client/client.py "http://127.0.0.1:8080/sse" "@wonderwhy-er/desktop-commander@^0.2.11"

Alternative using make command:

make sse-client-terminal

Constraints on Tool Subset Utilization

If the controlling client environment is configured to invoke only a subset of the available utilities, be mindful of the following dependency: - The update_workflow utility must be present alongside create_workflow in the active context, as the former relies on the latter's descriptive context to detail the complete method for defining and configuring custom processing nodes.

Identified Operational Anomalies

  • Workflow Modification Issue: The update_workflow function requires the full, current configuration of the workflow being modified (either supplied directly by the user or retrieved via get_workflow_info). This is because the tool implements a complete configuration replacement mechanism, not a partial patch application.

Version Control History Notes (CHANGELOG.md)

All novel features, bug resolutions, and performance enhancements will be formally documented in CHANGELOG.md. Pre-release versions prior to a stable release increment should adhere to the 0.x.x-dev format.

Troubleshooting Guide

  • Error Encountered: Error: spawn <command> ENOENT. This signifies that the specified <command> is either not installed or its location is absent from the system's PATH environment variable:
  • Verification: Confirm the software is installed and accessible via PATH.
  • Alternative Solution: Supply the absolute filesystem path to the command within the command field of your configuration. For instance, substitute python with /opt/miniconda3/bin/python.

WIKIPEDIA INSIGHT: Enterprise administration instruments encompass all systems, software, methodologies, and controls utilized by organizations to effectively navigate evolving market dynamics, maintain competitive standing, and enhance operational outcomes.

== High-Level Summary == These instruments are often segmented according to departmental function or management objective, such as strategic foresight, operational execution, record-keeping, personnel administration, or performance auditing. Modern business instruments have undergone rapid transformation catalyzed by technological leaps, frequently making optimal tool selection challenging due to the intense competitive pressures (cost reduction, sales growth, customer insight acquisition, and precise product delivery).

Managers are advised to adopt a strategic viewpoint regarding these instruments, customizing them to organizational requirements rather than adopting them wholesale. Failing to adapt tools often results in systemic instability.

== Prevalent Instruments (2013 Survey) == Key methodologies identified in a Bain & Company survey include: Strategic Planning, Customer Relationship Management (CRM), Employee Feedback Mechanisms, Benchmarking, The Balanced Scorecard, Core Competency Definition, Outsourcing Strategy, Organizational Change Programs, Supply Chain Optimization, Mission/Vision Statement articulation, and Market Segmentation.

== Business Software Applications == Software solutions designed for enterprise tasks aim to elevate productivity, quantify results, and execute complex operations with precision. The domain has evolved from rudimentary Management Information Systems (MIS) to integrated Enterprise Resource Planning (ERP), followed by the integration of CRM, culminating in today's cloud-native business management suites. Value addition hinges on both effective implementation rigor and judicious selection/tailoring of the appropriate tools.

See Also

`