UNS-MCP-Interface
A gateway for orchestrating operations with the Unstructured Data Processing Platform's core API. Facilitates the configuration of input sources, output destinations, and execution pipelines. Includes utilities for inventorying available data ingress points and querying specific connector configurations.
Author

Unstructured-IO
Quick Info
Actions
Tags
Unstructured Platform Management Control Point (MCP) Agent
This agent serves as the primary interface for managing the lifecycle of data integration assets within the Unstructured ecosystem. It enables users to systematically manage data origins, delivery targets, and processing flows.
Exposed Management Utilities
| Utility Name | Functionality Synopsis |
|---|---|
list_sources |
Enumerates all configured data ingress endpoints accessible via the Unstructured API. |
get_source_info |
Retrieves granular configuration specifics for a designated input connector. |
create_source_connector |
Provisions a new data source integration based on provided parameters. |
update_source_connector |
Modifies the settings of an existing source integration. |
delete_source_connector |
Decommissions a source connector, referenced by its unique identifier. |
list_destinations |
Presents a catalog of configured data output targets managed by the Unstructured API. |
get_destination_info |
Fetches comprehensive details for a specified output endpoint connector. |
create_destination_connector |
Establishes a new data delivery target definition. |
update_destination_connector |
Adjusts parameters for an existing output destination integration. |
delete_destination_connector |
Removes a defined destination connector, using its ID for reference. |
list_workflows |
Returns a ledger of defined automated processing sequences (workflows). |
get_workflow_info |
Obtains exhaustive structural details for an individual workflow definition. |
create_workflow |
Constructs a novel processing pipeline, linking sources, destinations, etc. |
run_workflow |
Initiates the execution of a specific workflow, identified by its ID. |
update_workflow |
Replaces the configuration of an extant workflow definition. |
delete_workflow |
Eradicates a designated workflow pipeline by its identifier. |
list_jobs |
Lists all processing instances associated with a particular workflow execution history. |
get_job_info |
Retrieves detailed status and metadata for a singular processing job ID. |
cancel_job |
Terminates an actively running or pending job instance. |
list_workflows_with_finished_jobs |
Generates a report of workflows containing at least one completed job, including related source/destination metadata. |
Below is a curated selection of integrated data integration mechanisms currently supported by the UNS-MCP-Interface. Comprehensive documentation for all supported source connectors can be found here, and for destinations, consult here. Expansion of this catalog is an ongoing effort.
| Supported Ingress Points | Supported Egress Points |
|---|---|
| S3 Storage | S3 Storage |
| Azure Blob Storage | Weaviate Vector Database |
| Google Drive | Pinecone Vector Database |
| Microsoft OneDrive | AstraDB |
| Salesforce | MongoDB |
| SharePoint | Neo4j Graph Database |
| Databricks Volumes | |
| Databricks Delta Table |
To successfully execute utilities for creating, modifying, or removing connectors, the requisite authentication material for that specific connector must be present within your local environment configuration file (.env). The following table enumerates the essential environmental variables tied to supported connectors:
| Credential Name | Configuration Requirement Summary |
|---|---|
ANTHROPIC_API_KEY |
Mandatory for the minimal_client to interface with our server infrastructure. |
AWS_KEY, AWS_SECRET |
Necessary for provisioning S3 source/destination connectors via the uns-mcp agent. Guidance available here and here. |
WEAVIATE_CLOUD_API_KEY |
Required when setting up the Weaviate vector database delivery endpoint. Instructions here. |
FIRECRAWL_API_KEY |
Needed to utilize Firecrawl processing routines detailed in external/firecrawl.py. Obtain a key from Firecrawl. |
ASTRA_DB_APPLICATION_TOKEN, ASTRA_DB_API_ENDPOINT |
Required for configuring the AstraDB connector. Steps outlined in the documentation here. |
AZURE_CONNECTION_STRING |
Primary authentication mechanism (Option 1) for creating Azure Blob Storage ingress points. See setup details here. |
AZURE_ACCOUNT_NAME+AZURE_ACCOUNT_KEY |
Secondary authentication mechanism (Option 2) for Azure Blob Storage ingress configuration. Refer to documentation. |
AZURE_ACCOUNT_NAME+AZURE_SAS_TOKEN |
Tertiary authentication method (Option 3) for Azure Blob Storage ingress setup. Details available here. |
NEO4J_PASSWORD |
Essential for establishing a connection to the Neo4j graph database as an output target. See configuration guide here. |
MONGO_DB_CONNECTION_STRING |
Prerequisite for setting up the MongoDB egress mechanism. Configuration guide here. |
GOOGLEDRIVE_SERVICE_ACCOUNT_KEY |
A base64 encoded string derived from your Google Service Account JSON key file. Conversion command: base64 < /path/to/google_service_account_key.json. Follow setup instructions here. |
DATABRICKS_CLIENT_ID,DATABRICKS_CLIENT_SECRET |
Required for provisioning Databricks Volume or Delta Table output connectors. Refer to guides here and here. |
ONEDRIVE_CLIENT_ID, ONEDRIVE_CLIENT_CRED,ONEDRIVE_TENANT_ID |
Necessary credentials for configuring Microsoft OneDrive as a data ingress source. Instructions here. |
PINECONE_API_KEY |
Required to provision the Pinecone vector database output handler. Setup guide here. |
SALESFORCE_CONSUMER_KEY,SALESFORCE_PRIVATE_KEY |
Required for connecting to Salesforce as a data source. Details in the documentation here. |
SHAREPOINT_CLIENT_ID, SHAREPOINT_CLIENT_CRED,SHAREPOINT_TENANT_ID |
Necessary credentials for integrating with SharePoint as an input source. Setup instructions here. |
LOG_LEVEL |
Controls the verbosity of the minimal_client runtime output; e.g., setting to ERROR minimizes extraneous console messages. |
CONFIRM_TOOL_USE |
If set to true, the minimal_client will prompt for explicit confirmation before executing any tool call. |
DEBUG_API_REQUESTS |
Setting this to true enables verbose logging of request payloads within uns_mcp/server.py for enhanced debugging insight. |
Specialized Firecrawl Integration
Firecrawl extends the MCP's capabilities with two distinct web interaction methods:
- Raw HTML Retrieval: Utilize
invoke_firecrawl_crawlhtmlto initiate a crawl job, monitored bycheck_crawlhtml_status. - LLM-Optimized Text Generation: Employ
invoke_firecrawl_llmtxtto generate model-ready text, retrieving outcomes viacheck_llmtxt_status.
Firecrawl Operation Flow:
Web Scraping Sequence:
- Commences at a specified Uniform Resource Locator (URL) and analyzes outbound links.
- Prioritizes utilizing a sitemap if one is present; otherwise, it navigates discovered internal links.
- Systematically explores linked pages recursively to map the entire site structure.
- Aggregates content from every visited page, managing JavaScript rendering and network rate limits.
- Jobs can be halted mid-execution using cancel_crawlhtml_job.
- Recommended when the requirement is for the complete, raw HTML extraction, which Unstructured's processing engine excels at refining. :smile:
Text Optimization Sequence:
- Post-crawling, it refines the gathered data into clean, semantically rich textual content.
- Formats this content specifically for optimal consumption by Large Language Models (LLMs).
- Output artifacts are automatically deposited into a designated S3 repository.
- Critical Note: Text generation tasks cannot be halted once initiated. While cancel_llmtxt_job is provided for interface consistency, the underlying Firecrawl API currently does not support cancellation for this operation type.
Prerequisite: The FIRECRAWL_API_KEY environment variable must be configured to enable these functionalities.
Setup and Environment Provisioning
This section outlines the necessary procedures to initialize and configure the UNS_MCP agent, favoring Python 3.12 and the uv package manager.
Essential Requirements
- Python Version 3.12 or newer.
- The
uvutility for dependency and environment handling. - An active API credential from the Unstructured Platform. You can secure yours by registering here.
Utilizing uv (Recommended Path)
No separate installation is needed when invoking via uvx as it manages execution context. For direct package installation:
bash
uv pip install uns_mcp
Claude Desktop Integration (via uvx)
For seamless integration with the Claude Desktop application, append the following configuration structure to your claude_desktop_config.json file.
Configuration Target: The configuration file is typically situated in ~/Library/Application Support/Claude/.
Configuration using uvx Command:
{
"mcpServers": {
"UNS_MCP": {
"command": "uvx",
"args": ["uns_mcp"],
"env": {
"UNSTRUCTURED_API_KEY": "
Configuration using Python Package Invocation:
{
"mcpServers": {
"UNS_MCP": {
"command": "python",
"args": ["-m", "uns_mcp"],
"env": {
"UNSTRUCTURED_API_KEY": "
Direct Source Code Checkout Method
-
Obtain a local copy of the repository.
-
Install project dependencies: bash uv sync
-
Establish your Unstructured API key via an environment variable. Create a
.envfile in the repository root containing: bash UNSTRUCTURED_API_KEY="YOUR_KEY"Consult
.env.templatefor a complete list of modifiable environmental settings.
You may now launch the server using one of the subsequent deployment strategies:
Editable Package Installation Method
Install the package in editable mode: bash uvx pip install -e . Update the Claude Desktop configuration as follows: { "mcpServers": { "UNS_MCP": { "command": "uvx", "args": ["uns_mcp"] } } } **Crucial Note**: Ensure the configuration correctly points to the `uvx` executable within the environment where the package was installed.Server-Sent Events (SSE) Protocol Operation
**Limitation:** This protocol method is incompatible with Claude Desktop. For more straightforward debugging isolation, the client and server components can be run independently: 1. Initiate the server process in one terminal instance: bash uv run python uns_mcp/server.py --host 127.0.0.1 --port 8080 # Alternatively, use the convenience command: make sse-server 2. Test connectivity using a local client instance in a separate terminal: bash uv run python minimal_client/client.py "http://127.0.0.1:8080/sse" # Alternatively, use the convenience command: make sse-client **Shutdown Sequence:** Terminate the services by applying `Ctrl+C` first to the client process, followed by the server process.Standard I/O (Stdio) Server Protocol
Configure the Claude Desktop client to utilize the Stdio protocol: { "mcpServers": { "UNS_MCP": { "command": "ABSOLUTE/PATH/TO/.local/bin/uv", "args": [ "--directory", "ABSOLUTE/PATH/TO/YOUR-UNS-MCP-REPO/uns_mcp", "run", "server.py" ] } } } Alternatively, execute the local client directly referencing the server script: bash uv run python minimal_client/client.py uns_mcp/server.pySupplementary Minimal Client Configuration
Control client behavior via environmental parameters:
- LOG_LEVEL="ERROR": Reduces console verbosity by suppressing detailed operational logs, focusing output on user-relevant messages.
- CONFIRM_TOOL_USE='false': Disables the mandatory confirmation step before tool execution. Exercise extreme caution with this setting, especially during testing, as it permits the LLM to potentially initiate costly operations or data destruction actions without explicit final authorization.
Debugging Utilities
Anthropic provides the MCP Inspector utility for interactive debugging and testing of your MCP agent. Execute the following command to launch the debugging interface. Within this UI, you can map local environment variables (including your secret keys) in the left panel, and then proceed to the tools section to exercise the agent's functionalities.
mcp dev uns_mcp/server.py
If you wish to log the precise parameters transmitted to the UnstructuredClient methods, set the environmental variable DEBUG_API_REQUESTS=false.
Logs documenting these request parameters are saved daily, prefixed with unstructured-client-, allowing for detailed post-mortem analysis.
Integrating Terminal Access with the Minimal Client
We leverage @wonderwhy-er/desktop-commander, which is built atop the MCP Filesystem Server, to grant the minimal client command-line access. Be aware that this grants the client (and thus the controlling LLM) read/write access to sensitive local files.
Install the required package using: bash npx @wonderwhy-er/desktop-commander setup
Then, launch the client specifying the extra communication channel parameter:
bash uv run python minimal_client/client.py "http://127.0.0.1:8080/sse" "@wonderwhy-er/desktop-commander@^0.2.11"
Alternative using make command:
make sse-client-terminal
Constraints on Tool Subset Utilization
If the controlling client environment is configured to invoke only a subset of the available utilities, be mindful of the following dependency:
- The update_workflow utility must be present alongside create_workflow in the active context, as the former relies on the latter's descriptive context to detail the complete method for defining and configuring custom processing nodes.
Identified Operational Anomalies
- Workflow Modification Issue: The
update_workflowfunction requires the full, current configuration of the workflow being modified (either supplied directly by the user or retrieved viaget_workflow_info). This is because the tool implements a complete configuration replacement mechanism, not a partial patch application.
Version Control History Notes (CHANGELOG.md)
All novel features, bug resolutions, and performance enhancements will be formally documented in CHANGELOG.md. Pre-release versions prior to a stable release increment should adhere to the 0.x.x-dev format.
Troubleshooting Guide
- Error Encountered:
Error: spawn <command> ENOENT. This signifies that the specified<command>is either not installed or its location is absent from the system's PATH environment variable: - Verification: Confirm the software is installed and accessible via PATH.
- Alternative Solution: Supply the absolute filesystem path to the command within the
commandfield of your configuration. For instance, substitutepythonwith/opt/miniconda3/bin/python.
WIKIPEDIA INSIGHT: Enterprise administration instruments encompass all systems, software, methodologies, and controls utilized by organizations to effectively navigate evolving market dynamics, maintain competitive standing, and enhance operational outcomes.
== High-Level Summary == These instruments are often segmented according to departmental function or management objective, such as strategic foresight, operational execution, record-keeping, personnel administration, or performance auditing. Modern business instruments have undergone rapid transformation catalyzed by technological leaps, frequently making optimal tool selection challenging due to the intense competitive pressures (cost reduction, sales growth, customer insight acquisition, and precise product delivery).
Managers are advised to adopt a strategic viewpoint regarding these instruments, customizing them to organizational requirements rather than adopting them wholesale. Failing to adapt tools often results in systemic instability.
== Prevalent Instruments (2013 Survey) == Key methodologies identified in a Bain & Company survey include: Strategic Planning, Customer Relationship Management (CRM), Employee Feedback Mechanisms, Benchmarking, The Balanced Scorecard, Core Competency Definition, Outsourcing Strategy, Organizational Change Programs, Supply Chain Optimization, Mission/Vision Statement articulation, and Market Segmentation.
== Business Software Applications == Software solutions designed for enterprise tasks aim to elevate productivity, quantify results, and execute complex operations with precision. The domain has evolved from rudimentary Management Information Systems (MIS) to integrated Enterprise Resource Planning (ERP), followed by the integration of CRM, culminating in today's cloud-native business management suites. Value addition hinges on both effective implementation rigor and judicious selection/tailoring of the appropriate tools.
