mcp-llm-service-deployer
Establishes bespoke Large Language Model serving endpoints leveraging YAML manifests for configuration, granting regulated access to arbitrary assets and structured instructional phrasing. It enables capability enhancement via invoking external utilities while strictly adhering to the Machine Chat Protocol.
Author

phil65
Quick Info
Actions
Tags
mcp-server-llmling
LLMling Server Manual
Overview
mcp-server-llmling facilitates the establishment of a server adhering to the Machine Chat Protocol (MCP), utilizing a YAML-driven infrastructure definition system tailored for Large Language Model operations.
LLMLing, the underlying execution engine, furnishes a configuration mechanism based on YAML blueprints for LLM workflows. It enables the instantiation of custom MCP gateways that serve synthesized artifacts defined within YAML specifications.
- Declarative Setup: Define the operational landscape for the LLM using YAML; coding is optional for basic definitions.
- MCP Compliance: Engineered directly upon the Machine Chat Protocol for standardized conversational interfacing.
- Component Taxonomy:
- Artifacts (Resources): Sources of consumable data (e.g., documents, literal text blocks, external command outputs, etc.).
- Templates (Prompts): Predefined message structures incorporating variable placeholders.
- Capabilities (Tools): Python functions made callable by the language model agent.
The resulting YAML structure dictates a comprehensive operational context, furnishing the LLM agent with: - Mechanisms for accessing defined data artifacts. - Standardized input phrasing via prompt templates. - Mechanisms to augment functionality through callable utilities.
Key Features
1. Artifact Management
- Ingestion and administration of diverse artifact modalities:
- Local file system entries (
PathResource) - Embedded textual content (
TextResource) - Shell command execution outputs (
CLIResource) - Source code files (
SourceResource) - Results derived from Python invocation (
CallableResource) - Visual data representations (
ImageResource) - Provisions for monitoring artifact states and enabling live hot-reloading.
- Support for artifact preprocessing workflows.
- Access routing based on Uniform Resource Identifiers (URIs).
2. Utility System
- Registration and invocation of arbitrary Python routines as LLM-accessible instruments.
- Compatibility layer for OpenAPI-defined instrumentation specifications.
- Automated discovery of utilities via defined entry points.
- Automated validation checks for utility signatures and input arguments.
- Support for machine-interpretable utility outcome formats.
3. Phrasing System
- Static prompt definitions featuring templating capabilities.
- Dynamic prompt generation sourced from Python routines.
- Prompt definitions persisted in external files.
- Validation logic applied to prompt substitution variables.
- Suggestion engine for populating prompt argument fields.
4. Multiple Communication Pathways
- Standard input/output stream communication (the default mechanism).
- Server-Sent Events (SSE) or streaming HTTP interfaces for asynchronous web-based consumers.
- Extensibility hooks for integrating novel transport layer implementations.
Operation Instructions
Integration with Zed Editor
Integrate LLMLing as a context-providing service within your settings.json configuration:
{
"context_servers": {
"llmling": {
"command": {
"env": {},
"label": "llmling",
"path": "uvx",
"args": [
"mcp-server-llmling",
"start",
"path/to/your/config.yml"
]
},
"settings": {}
}
}
}
Configuration for Claude Desktop
Configure the LLMLing service within your claude_desktop_config.json manifest:
{
"mcpServers": {
"llmling": {
"command": "uvx",
"args": [
"mcp-server-llmling",
"start",
"path/to/your/config.yml"
],
"env": {}
}
}
}
Direct Server Initiation
Execute the service directly from the terminal interface:
# Obtain the most current iteration
uvx mcp-server-llmling@latest
1. Programmatic Invocation
from llmling import RuntimeConfig
from mcp_server_llmling import LLMLingServer
async def main() -> None: # Renamed to maintain context
async with RuntimeConfig.open(config) as runtime: # Renamed config variable for flow clarity
service_instance = LLMLingServer(runtime, enable_injection=True)
await service_instance.start()
asyncio.run(main())
2. Utilizing a Custom Transport Mechanism
from llmling import RuntimeConfig
from mcp_server_llmling import LLMLingServer
async def main() -> None: # Renamed to maintain context
async with RuntimeConfig.open(config) as runtime: # Renamed config variable for flow clarity
service_instance = LLMLingServer(
config,
transport="sse",
transport_options={
"host": "localhost",
"port": 3001,
"cors_origins": ["http://localhost:3000"]
}
)
await service_instance.start()
asyncio.run(main())
3. Artifact Definition in Configuration
resources:
python_code:
type: path
path: "./src/**/*.py"
watch:
enabled: true
patterns:
- "*.py"
- "!**/__pycache__/**"
api_docs:
type: text
content: |
API Documentation
================
...
4. Utility Specification in Configuration
tools:
analyze_code:
import_path: "mymodule.tools.analyze_code"
description: "Assess the structural composition of Python source code"
toolsets:
api:
type: openapi
spec: "https://api.example.com/openapi.json"
[!TIP] For OpenAPI specifications, consider deploying Redocly CLI to aggregate and resolve schema references prior to deployment with LLMLing. This ensures complete specification fidelity and correct formatting. If Redocly is detected on the system path, it will be invoked automatically.
Service Manifest Structure
The service is initialized via a YAML declaration file featuring the subsequent top-level segments:
global_settings:
timeout: 30
max_retries: 3
log_level: "INFO"
requirements: []
pip_index_url: null
extra_paths: []
resources:
# Definitions for data artifacts...
tools:
# Definitions for callable utilities...
toolsets:
# Definitions for utility groupings...
prompts:
# Definitions for standardized phrasing...
MCP Interface Specification
The service rigorously implements the MCP standard, which mandates support for the following functional domains:
- Artifact Operations
- Enumeration of accessible artifacts.
- Retrieval of artifact content payloads.
-
Subscription to artifact state evolution notifications.
-
Utility Operations
- Listing of registered executable instruments.
- Execution of instruments with specified arguments.
-
Fetching instrument definition schemas.
-
Phrasing Operations
- Cataloging of available prompt templates.
- Obtaining fully resolved prompt messages.
-
Acquiring input value suggestions for prompt parameters.
-
System Feedback
- Alerts concerning artifact alterations.
- Updates regarding instrument or prompt catalog changes.
- Operational progress indicators.
- Transmitted diagnostic logs.
WIKIPEDIA: XMLHttpRequest (XHR) is an Application Programming Interface within JavaScript, implemented as an object, that facilitates the transmission of HTTP requests from a client-side web environment to an origin server. These methods allow web applications to initiate server communications subsequent to initial page rendering and subsequently process incoming data. XMLHttpRequest forms a foundational element of Ajax methodologies. Prior to its widespread adoption, server interaction heavily relied on traditional hyperlinking and form submission mechanisms, often resulting in a complete page refresh.
== Genesis ==
The foundational concept underpinning XMLHttpRequest was formulated in the year 2000 by the development team behind Microsoft Outlook. This notion was subsequently integrated into the Internet Explorer 5 browser release (1999). However, the initial syntax did not utilize the explicit XMLHttpRequest identifier. Instead, developers employed the COM object instantiations: ActiveXObject("Msxml2.XMLHTTP") and ActiveXObject("Microsoft.XMLHTTP"). By the time Internet Explorer 7 (2006) launched, universal support for the standard XMLHttpRequest identifier was established across all major browser platforms, including Mozilla's Gecko rendering engine (2002), Safari 1.2 (2004), and Opera 8.0 (2005).
=== Standardization Efforts === The World Wide Web Consortium (W3C) published its initial Working Draft specification for the XMLHttpRequest object on April 5, 2006. A subsequent Level 2 Working Draft was released on February 25, 2008, introducing capabilities for monitoring request progress events, enabling cross-origin communication, and handling raw byte streams. Towards the close of 2011, the Level 2 features were formally incorporated into the primary specification document. As of the end of 2012, stewardship of development transitioned to the WHATWG, which currently maintains an active, evolving specification utilizing Web IDL notation.
== Utilization Protocol == Generally, issuing a request via XMLHttpRequest involves a sequence of distinct programming phases.
- Instantiate an XMLHttpRequest object by invoking its constructor:
- Invoke the
openmethod to define the transmission methodology (request type), designate the target resource endpoint, and select between synchronous or asynchronous execution modes: - For asynchronous operations, define an event handler (listener) responsible for reacting to subsequent state transitions in the request lifecycle:
- Commence the actual transmission by calling the
sendmethod, optionally packaging payload data: - Monitor the state changes within the registered event handler. Upon successful response receipt, the object's state transitions to 4, signifying the "done" state, with the server's returned payload typically residing in the
responseTextattribute. Beyond these essential steps, XMLHttpRequest provides extensive control mechanisms over transmission behavior and response parsing. Custom header fields can be appended to tailor server fulfillment logic, and data streams can be uploaded by supplying content to thesendcall. The received payload can be automatically parsed from JSON format into native JavaScript objects, or processed incrementally as data arrives rather than waiting for complete transmission. Furthermore, requests can be terminated prematurely or subject to a time-out constraint.
== Inter-Domain Communication ==
During the initial evolution phase of the World Wide Web, mechanisms permitting cross-origin data exchange were found to potentially introduce security vulnerabilities, leading to restrictions on this functionality.
