logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

UNS-MCP-Gateway

Interface layer for interacting with the Unstructured Platform's core data orchestration services. Facilitates comprehensive lifecycle management for data sources, processing destinations, execution workflows, and job monitoring via dedicated API calls.

Author

UNS-MCP-Gateway logo

liuchongchong1995

No License

Quick Info

GitHub GitHub Stars 0
NPM Weekly Downloads 348
Tools 1
Last Updated 2026-02-19

Tags

unstructuredtoolsmcpunstructured apibusiness toolsuns mcp

Unstructured MCP Server Implementation

This repository furnishes an MCP server tailored for seamless communication with the Unstructured API. It exposes functionalities crucial for orchestrating data pipelines, specifically for listing existing assets and managing operational workflows.

Supported Operations (Tooling Catalog)

Operational Command Core Functionality Description
list_sources Retrieves a manifest of all configured source connectors from the API.
get_source_info Fetches granular details for a specified source connection asset.
create_source_connector Provisions a new source connector instance based on provided parameters.
update_source_connector Modifies parameters of an extant source connector asset.
delete_source_connector Decommissions a source connector identified by its unique source ID.
list_destinations Enumerates all registered destination connectors known to the API.
get_destination_info Obtains in-depth specifications for a particular destination asset.
create_destination_connector Establishes a new destination connector using specified configuration.
update_destination_connector Applies parameter changes to an established destination connector ID.
delete_destination_connector Removes a destination connector based on its unique destination ID.
list_workflows Returns a comprehensive list of defined processing workflows.
get_workflow_info Acquires detailed configuration data for a selected workflow.
create_workflow Deploys a novel workflow, defining its source, destination linkage, etc.
run_workflow Initiates the execution sequence for a specified workflow ID.
update_workflow Replaces the configuration of an existing workflow via new parameters.
delete_workflow Eradicates a specific workflow identified by its ID.
list_jobs Returns a history of execution jobs linked to a particular workflow.
get_job_info Retrieves detailed status and output metrics for an individual job ID.
cancel_job Terminates an actively running job instance identified by its ID.

Consult the Unstructured documentation for the complete catalog of supported connection types: source connectors here and destination connectors here. Expansion of support is ongoing.

Currently Supported Sources Currently Supported Destinations
S3 S3
Azure Blob Storage Weaviate Vector Database
Google Drive Pinecone Vector Store
OneDrive AstraDB
Salesforce MongoDB
Sharepoint Neo4j Graph Database
Databricks Volumes
Databricks Volumes Delta Table

Prerequisite for Connector Manipulation: To employ tools for creating, altering, or removing connection assets, the corresponding access credentials must be present and correctly defined within your environment configuration file (e.g., .env). The required credentials are itemized below:

Credential Identifier Purpose and Context
ANTHROPIC_API_KEY Essential for the minimal_client to interface with this server instance.
AWS_KEY, AWS_SECRET Mandatory for configuring S3 source/destination connectors via this UNS-MCP endpoint. Consult setup guides here and here.
WEAVIATE_CLOUD_API_KEY Required for establishing the Weaviate vector database destination connector. Reference documentation here.
FIRECRAWL_API_KEY Needed to utilize external Firecrawl capabilities detailed in external/firecrawl.py. Obtain key from Firecrawl.
ASTRA_DB_APPLICATION_TOKEN, ASTRA_DB_API_ENDPOINT Required for provisioning the AstraDB connector through the server. See configuration instructions here.
AZURE_CONNECTION_STRING Authentication Option 1 for provisioning the Azure Blob Storage source connector. Refer to setup guide here.
AZURE_ACCOUNT_NAME+AZURE_ACCOUNT_KEY Authentication Option 2 for provisioning the Azure Blob Storage source connector. Reference setup guide here.
AZURE_ACCOUNT_NAME+AZURE_SAS_TOKEN Authentication Option 3 for provisioning the Azure Blob Storage source connector. Reference setup guide here.
NEO4J_PASSWORD Required for setting up the Neo4j destination connector via this server utility. Guide available here.
MONGO_DB_CONNECTION_STRING Necessary for establishing the MongoDB destination connector through the UNS-MCP service. Details provided here.
GOOGLEDRIVE_SERVICE_ACCOUNT_KEY A base64 encoded string representation of the service account JSON key file, as outlined in the Google Drive source setup guide here. Use base64 < /path/to/keyfile.json to generate the string.
DATABRICKS_CLIENT_ID,DATABRICKS_CLIENT_SECRET Mandatory for configuring Databricks Volume or Delta Table destinations. Consult documentation here and here.
ONEDRIVE_CLIENT_ID, ONEDRIVE_CLIENT_CRED,ONEDRIVE_TENANT_ID Necessary credentials for provisioning the Microsoft OneDrive destination connector. Configuration details are found here.
PINECONE_API_KEY Required to instantiate the Pinecone vector database destination connector. Instructions located here.
SALESFORCE_CONSUMER_KEY,SALESFORCE_PRIVATE_KEY Essential for setting up the Salesforce source connector. See configuration steps within the ingestion documentation here.
SHAREPOINT_CLIENT_ID, SHAREPOINT_CLIENT_CRED,SHAREPOINT_TENANT_ID Required to initialize the SharePoint source connector. Guide available here.
LOG_LEVEL Controls verbosity of logging output from the minimal_client (e.g., setting to ERROR minimizes informational output).
CONFIRM_TOOL_USE When set to 'true', the minimal_client prompts for explicit confirmation prior to executing any tool call.
DEBUG_API_REQUESTS Setting this to 'true' triggers the uns_mcp/server.py script to output detailed request parameters for enhanced debugging of calls made to the Unstructured API.

Integrated Firecrawl Capabilities

Firecrawl (https://www.firecrawl.dev/) adds web resource acquisition capabilities to the MCP via two primary toolsets, both requiring a valid FIRECRAWL_API_KEY:

  1. Raw HTML Acquisition: Utilizes invoke_firecrawl_crawlhtml to initiate scraping jobs and check_crawlhtml_status for progress tracking.
  2. LLM-Ready Text Extraction: Employs invoke_firecrawl_llmtxt to generate cleaned, optimized text directly suitable for LLM consumption, monitored via check_llmtxt_status.

Web Acquisition Workflow Details: - Process: Begins at a specified URL, maps out linked content (preferring sitemaps), recursively visits all reachable pages, captures content while respecting JS rendering and rate limits. - Cancellation: Jobs initiated by HTML crawling can be halted using cancel_crawlhtml_job. - LLM Text Generation: Post-crawl, content is refined into clean text formatted optimally for LLMs and automatically deposited into the designated S3 location. Note: LLM text generation jobs are irreversible once started; cancel_llmtxt_job is present for API symmetry but is presently non-functional against the Firecrawl backend.

Deployment and Environment Setup

This section details the procedure for deploying and configuring the UNS_MCP server, utilizing Python 3.12 and the uv dependency manager.

System Prerequisites

If leveraging uvx, no explicit installation step is necessary as it manages execution context. If installing the package system-wide:

uv pip install uns_mcp

Claude Desktop Configuration Integration

To integrate with Claude Desktop, integrate the following JSON object into your claude_desktop_config.json file (typically located in ~/Library/Application Support/Claude/).

Option A: Utilizing uvx (Preferred Command Line Invocation)

{
   "mcpServers": {
      "UNS_MCP": {
         "command": "uvx",
         "args": ["uns_mcp"],
         "env": {
           "UNSTRUCTURED_API_KEY": "<your-key>"
         }
      }
   }
}

Option B: Using Direct Python Package Execution

{
   "mcpServers": {
      "UNS_MCP": {
         "command": "python",
         "args": ["-m", "uns_mcp"],
         "env": {
           "UNSTRUCTURED_API_KEY": "<your-key>"
         }
      }
   }
}

From Source Checkout

  1. Obtain the repository source code.

  2. Resolve dependencies: bash uv sync

  3. Establish your Unstructured API key via a .env file in the repository root: bash UNSTRUCTURED_API_KEY="YOUR_KEY" Consult .env.template for all configurable environment variables.

Server execution methods:

Editable Package Installation Method Install in editable mode:
uvx pip install -e .
Update Claude Desktop configuration to reflect the editable installation:
{
  "mcpServers": {
    "UNS_MCP": {
      "command": "uvx",
      "args": ["uns_mcp"]
    }
  }
}
**Crucial**: Ensure the configuration points to the correct `uvx` executable path.
Server-Sent Events (SSE) Protocol Configuration **Warning**: This protocol is explicitly **not supported** by Claude Desktop. For simplified debugging, SSE allows decoupling client and server operation: 1. Launch the server process in one terminal: ```bash uv run python uns_mcp/server.py --host 127.0.0.1 --port 8080 # OR use the provided target make sse-server ``` 2. Test server connectivity using a local client in a separate terminal: ```bash uv run python minimal_client/client.py "http://127.0.0.1:8080/sse" # OR use the provided target make sse-client ``` **Shutdown Sequence**: Terminate the client process first (`Ctrl+C`), followed by the server process.
Standard I/O (Stdio) Server Protocol Configure Claude Desktop to utilize the stdio transport:
{
  "mcpServers": {
    "UNS_MCP": {
      "command": "ABSOLUTE/PATH/TO/.local/bin/uv",
      "args": [
        "--directory",
        "ABSOLUTE/PATH/TO/YOUR-UNS-MCP-REPO/uns_mcp",
        "run",
        "server.py"
      ]
    }
  }
}
Alternatively, execute the local client directly:
uv run python minimal_client/client.py uns_mcp/server.py

Client-Side Environmental Control

Additional configuration for the minimal client can be managed via environment variables: - LOG_LEVEL="ERROR": Filters client output to show only critical messages, improving user experience. - CONFIRM_TOOL_USE='false': Bypasses the execution confirmation prompt. Exercise extreme caution with this setting, particularly in production or during initial testing, as it permits unverified execution of costly or destructive operations.

Diagnostic Utilities

The MCP Inspector tool, provided by Anthropic, offers a dedicated UI for server inspection and testing. Initiate the debugging environment with:

mcp dev uns_mcp/server.py

Within the UI, you can supply environment variables (including your private API key) in the left panel and test tool functionality under the 'tools' section.

To log the specific parameters transmitted to the UnstructuredClient functions, set DEBUG_API_REQUESTS=false. Logs are timestamped and written to files named unstructured-client-{date}.log.

Enabling Terminal Access for the Client

By integrating @wonderwhy-er/desktop-commander, which relies on the MCP Filesystem Server architecture, the minimal client gains direct shell access. Be advised: This grants the LLM client elevated access to local file systems.

Install the necessary commander utility:

npx @wonderwhy-er/desktop-commander setup

Then initiate the client with the commander argument:

uv run python minimal_client/client.py "http://127.0.0.1:8080/sse" "@wonderwhy-er/desktop-commander"
# OR use the simplified target
make sse-client-terminal

Tool Subset Considerations

If your client environment requires loading only a partial set of the available tools, note the following interdependency: - The update_workflow tool must be loaded alongside create_workflow. This is because the update mechanism requires the full context description of node configuration provided in the creation tool's documentation, as it performs a full replacement, not a partial patch.

Identified Operational Limitations (Known Issues)

  • update_workflow: To successfully execute an update, the tool requires the complete existing configuration of the workflow being modified, either supplied directly by the user or retrieved via a preceding get_workflow_info call. The API endpoint treats updates as a complete configuration overwrite.

Version Control Notes (CHANGELOG.md)

All feature additions, bug resolutions, and enhancements will be formally documented in CHANGELOG.md. Pre-release versions will adhere to the 0.x.x-dev format until a stable version increment is warranted.

Debugging Guide for Common Startup Errors

  • Error Type: Error: spawn <command> ENOENT suggests the operating system cannot locate <command> in the system's PATH or it is not installed.
  • Resolution: Verify the software installation and PATH visibility. Alternatively, provide the fully qualified absolute path to the executable within the configuration file's command field (e.g., replacing python with /usr/bin/python3).

WIKIPEDIA CONTEXT: Business management tools encompass all organizational systems, methodologies, and calculating solutions designed to help entities successfully navigate dynamic markets, maintain competitive standing, and enhance overall corporate performance.

== General Management Tool Classification == These instruments can be functionally categorized based on typical departmental needs: planning, process management, record-keeping, personnel management, decision support, and control mechanisms. Modern business tools have seen exponential evolution driven by technological advancements, creating a challenging landscape for selecting optimal solutions. This complexity stems from continuous pressure to reduce expenditure, expand revenue streams, deeply understand consumer requirements, and deliver products aligned with those demands. In this environment, managerial focus should shift towards a strategic framework for tool adoption and adaptation, rather than simply incorporating the newest available technology. Over-reliance on unadapted tools often leads to operational fragility; consequently, business management tools must be meticulously chosen and then tailored to fit the organization's specific operational needs.

== Historical Usage Insights (Bain & Company, 2013) == A 2013 survey by Bain & Company provided a global perspective on tool utilization, reflecting regional needs shaped by market conditions. The leading ten tools identified were: Strategic planning, CRM, Employee engagement surveys, Benchmarking, Balanced scorecard, Core competency assessment, Outsourcing strategy, Change management programs, Supply chain management, and Mission/Vision statement development, alongside Market segmentation and TQM.

== Business Software Definition == Software applications are collections of programs utilized by personnel to execute various corporate functions, aimed at optimizing output, measuring performance, and ensuring procedural accuracy. This software domain evolved from initial Management Information Systems (MIS) to Enterprise Resource Planning (ERP), later incorporating Customer Relationship Management (CRM), and is now predominantly situated within the cloud-based business management ecosystem. Value addition from IT investments hinges on two critical factors: the efficacy of the implementation process and the deliberate selection and customization of the tools themselves.

== Tools for Small and Medium Enterprises (SMEs) == SME-focused tooling is vital as it provides scalable mechanisms for operational efficiency...

See Also

`