Core Component: privateGPT MCP Interface Server

This repository details an implementation of an MCP server designed specifically to interface with privateGPT, allowing its functionality to be seamlessly integrated as an agent within any MCP-compatible client application environment.

Understanding the Model Context Protocol (MCP)

MCP serves as an open, standardized communication layer intended to streamline how various software entities deliver contextual information to Large Language Models (LLMs). Functionally, it acts as a universal adapter—analogous to a standardized port like USB-C—ensuring consistent connectivity between AI models and diverse data/tool sources.

Rationale for Adopting MCP

MCP is crucial for constructing sophisticated, agent-driven workflows reliant on LLMs. It addresses the necessity for LLMs to reliably interact with external data and utilities by offering: * A growing catalog of pre-configured integration modules for immediate LLM utilization. * Facilitation for vendor agnosticism, allowing easy swapping of LLM providers. * Enforcement of established best practices for securing sensitive organizational data within existing infrastructure.

Operational Schema: Client-Server Interactions

The architecture follows a classic client-server paradigm where a primary host application can communicate with numerous specialized servers:

MCP Consumers (Hosts): Applications (e.g., IDEs, desktop tools, external AI utilities) seeking data access via the MCP framework.
Protocol Connectors (Clients): Components maintaining direct, dedicated links to specific servers.
MCP Exponentials (Servers): Lightweight services that expose discrete capabilities via the standardized MCP.
Local Data Stores: On-premise data assets (filesystems, databases) securely accessible by MCP servers.
External APIs: Remote systems accessible over the internet.

System Overview

This specific server instance functions as a proxy, translating MCP requests into interactions with the privateGPT API. Key capabilities include: * Facilitating conversations with privateGPT, drawing context from both public datasets and secure, internal knowledge bases. * Mechanisms for constructing and managing proprietary knowledge artifacts (Sources). * Organizational structuring of knowledge via distinct user/system Groups. * Granular access control enforced through group membership assignments.

The Role of Autonomous Agents

An Agent, in the context of LLMs and MCP, is a sophisticated software module situated between the language model and the consuming application. It manages request lifecycle, executes MCP transactions, orchestrates workflows, and enforces security and performance parameters across the integrated AI ecosystem. Agents enable the development of robust, secure, and scalable AI applications.

Agent-LLM-MCP Interaction Flow (Illustrative Scenario)

Initiation: A user submits a query via an interface hosted by the MCP server.
Agent Mediation: The server's agent validates the input and prepares the structured request for the LLM.
LLM Communication: The agent forwards the prepared query to the underlying Large Language Model.
Result Formatting: The agent receives the LLM's output, performs any necessary post-processing (e.g., data restructuring), and transmits the final response back to the user.
Oversight: Throughout the cycle, the agent maintains rigorous monitoring, enforces security mandates, and logs transactional data.

Agent Benefits in this Architecture

Decoupling: Promotes clear separation of concerns, simplifying system maintenance and scaling.
Security Posture: Centralized access oversight and activity auditing minimize security exposure.
Throughput: Automated task execution leads to higher consistency and speed compared to manual operations.
Adaptability: Agents are easily modified or extended to accommodate new functionality or evolving business rules.

Criticality of Credential Obfuscation

Security is paramount when handling sensitive system credentials. This server manages two primary sets of secrets: 1. Proxy Authorization Headers: Credentials used by intermediaries like HAProxy for traffic validation. 2. LLM Access Secrets: Passwords securing interfaces to the foundational LLMs.

Storing these credentials in cleartext introduces severe risks (e.g., unauthorized system access, catastrophic data exposure). Employing encryption is mandatory; only the ciphertext representations should persist in configuration files.

Advantages of Ciphertext Reliance

Superior Protection: Even if configuration stores are compromised, the encrypted values are unusable without the corresponding private keys.
Regulatory Adherence: Encryption aids compliance with mandates governing sensitive data protection (e.g., GDPR, SOC 2).
Data Integrity: Protects credentials against unauthorized modification.

Security Infrastructure

The system incorporates a comprehensive suite of security controls governing data transmission, key handling, and operational access.

1. Transport Layer Security (TLS)

TLS (minimum 1.2) must be engaged to encrypt all data exchanged between the client and the server.

Mandatory Rationale for TLS Activation: * Confidentiality: Encrypts all payloads (passwords, proprietary data) against interception (MitM attacks). * Data Fidelity: Guarantees that received data has not been tampered with during transit. * Identity Verification: Uses digital certificates to confirm server identity (and optionally client identity), thwarting imposters. * Attack Mitigation: Directly defends against connection hijacking and replay maneuvers. * Regulatory Mandate: Fulfills baseline requirements for secure data transit stipulated by numerous compliance frameworks.

2. Credential Obfuscation (RSA Cryptography)

Passwords are encrypted using RSA asymmetric cryptography, specifically employing a 2048-bit key length and RSA_PKCS1_PADDING for modern security standards.

Workflow: 1. The administrator uses the server's public key (id_rsa_public.pem) via the provided utility (node security/generate_encrypted_password.js ...) to create the ciphertext password. 2. This ciphertext is provided to the client configuration. 3. The client transmits the encrypted value to the server, which employs its corresponding private key for decryption.

3. Key Hierarchy Management

Public Key (id_rsa.pub / .pem): Stored safely on the server; used only for encrypting data input. Exposure is non-critical.
Private Key (id_rsa): Stored securely with strict file permissions (chmod 600); used only for decryption operations.
Rotation Policy: Keys should be periodically refreshed. Reissuance invalidates prior keys, immediately revoking client/agent access until new encrypted credentials are provided.

4. Server-Side Decryption Mandate

Decryption is exclusively a server-side operation: * The private key is used to transform the received ciphertext back into plaintext. * The plaintext credential is never persisted; it resides only transiently in memory during the authentication check. * Mutual TLS validation (if configured) further authenticates endpoints.

5. Session Authentication Tokens

Authorization is managed via short-lived tokens generated upon successful user authentication. These tokens are cryptographically signed (HMAC/RSA) to ensure immutability against unauthorized alteration.

6. Key Generation (`keygen`) Access Restriction

The functionality to generate new keys is controlled by the ALLOW_KEYGEN configuration flag. All attempts to use this feature, whether successful or blocked, are recorded for auditing.

7. Certificate-Based Access Control (CBAC)

When certificate authentication is active, agents gain server access tied intrinsically to their unique key material, rejecting login attempts targeting dissimilar servers whose private certificates do not align.

8. Configuration Hardening

Security settings like SSL_VALIDATE (for certificate trustworthiness) and PW_ENCRYPTION (to enforce ciphertext-only password acceptance) are configurable via environment variables.

9. Surveillance and Auditing

All security-pertinent activities—including failed logins, key generation calls, and unauthorized access attempts—are meticulously logged for ongoing monitoring and post-incident analysis.

Server Feature Blueprint

The privateGPT MCP Interface provides extensive functional control via its API endpoints, governed by granular configuration toggles.

Core Functionalities Summary

Identity Services: Secure user login (token issuance) and explicit logout (token invalidation).
Conversational Management: Ability to initiate, continue, retrieve metadata for, and purge chat sessions.
Organizational Structuring: Tools to list, define, and retire user/system Groups.
Knowledge Artifact Management (Sources): CRUD operations for proprietary data sources, including assignment to specific organizational Groups.
User Lifecycle Management: Registration, modification, deactivation, and reactivation of system users.
System Adaptability: Fine-grained control over which endpoint functions are active via the configuration file, alongside locale selection.

Deployment & Configuration

Refer to the Installation section for dependency resolution and build procedures. Configuration is primarily managed through the privateGPT.env.json manifest, controlling aspects like proxy routing, TLS enablement, key paths, and feature activation.

Installation Guide

To prepare the environment: 1. Clone the repository. 2. Resolve dependencies via npm install. 3. Build the compiled assets using npm run build.

(Alternative automated setup is available via InstallMPCServer.sh on Linux systems.)

Manifest Specification (`privateGPT.env.json`)

Proxy Configuration Details

This governs routing through intermediary proxies (e.g., HAProxy): * USE_PROXY: Boolean flag to activate proxy traversal. * HEADER_ENCRYPTED: Dictates if the access credential header is sent as plaintext or ciphertext (if true, use the security utility to pre-encrypt the value for ACCESS_HEADER). * ACCESS_HEADER: The specific token or credential required by the proxy for forwarding traffic.

Server Runtime Settings

Network: Defines the operational PORT.
Localization: Selectable LANGUAGE for system messaging.
Trust: SSL_VALIDATE setting controls client validation of server certificates.
Cryptography: Configuration of PW_ENCRYPTION, and file paths pointing to the RSA keys (PUBLIC_KEY, PRIVATE_KEY).
Transport Security: ENABLE_TLS toggle, along with file paths for the server's server.key and server.crt.

Operational Restrictions

RESTRICTED_GROUPS: Limits client visibility into assignable organizational structures.
ENABLE_OPEN_AI_COMP_API: Activates compatibility layer for OpenAI API standards.

Auditing & Telemetry

Logging controls dictate persistence (WRITTEN_LOGFILE), IP address recording (LOG_IPs), and complete data suppression (ANONYMOUS_MODE).

Endpoint Feature Toggles

A comprehensive set of boolean flags (ENABLE_LOGIN, ENABLE_CREATE_SOURCE, etc.) allows system administrators to dynamically enable or disable every major API function without code modification.

Ancillary Security Utilities

Password Obfuscation Utility

Executes RSA encryption using the specified public key to produce a secure, Base64-encoded ciphertext. bash node security/generate_encrypted_password.js /path/to/public/key.pem

Password Verification Utility

Executes RSA decryption using the specified private key to validate the integrity of an encrypted password against its plaintext original. bash node security/generate_decrypted_password.js /path/to/private/key

Project Organization

Below is the map of the repository structure:

text MCP-Server-for-MAS-Developments/ ├── clients (Client samples across C#, C++, Go, Java, JS, PHP, Python) ├── security (Scripts for RSA encryption/decryption) ├── src (Core server logic, including privateGPT service integration) ├── docs (Architectural diagrams and documentation assets) ├── logs (Runtime log file storage) ├── tests (Automated test suites) └── ver (Version specific test scripts)

Licensing and Context

Licensed under the MIT Agreement. For context on the broader domain, Cloud Computing fundamentally involves providing on-demand access to pooled, scalable computing resources over a network, characterized by NIST's five pillars: Self-Service, Broad Access, Resource Pooling, Rapid Elasticity, and Measured Service. Cloud concepts trace origins back to 1960s time-sharing models.

privateGPT-Interconnect-for-MCP-Ecosystems

Author

Fujitsu-AI

Quick Info

Actions

Tags