ollama-context-protocol-gateway
A specialized communication bridge facilitating interactions with local Ollama large language model deployments via the Model Context Protocol (MCP). This component centralizes task orchestration, outcome validation, and systemic workflow automation, featuring granular URI routing, robust failure mitigation, and performance enhancements like connection pooling and response caching.
Author

NewAITees
Quick Info
Actions
Tags
Ollama-MCP-Gateway: Local LLM Integration Hub
This MCP Gateway establishes a secure, standardized conduit between MCP-compliant clients and locally hosted Ollama instances, offering advanced capabilities for task decomposition, result validation, and operational pipeline management.
Core Capabilities: - Task Segmentation: Breaking down monolithic computational requests into manageable sub-units. - Verification & Scoring: Assessing generated artifacts against predefined success metrics. - Model Lifecycle Control: Managing and invoking accessible Ollama models. - Protocol Adherence: Ensuring strict communication via the Model Context Protocol. - Resilience: Implementing sophisticated exception handling with explicit diagnostic reporting. - Efficiency Tuning: Employing resource optimization techniques (e.g., persistent connection management, Least Recently Used caching).
Resource Manifestation
The gateway exposes the following logical resource endpoints via distinct URI schemes:
- task://: Access point for individual processing assignments.
- result://: Access point for evaluated computational outputs.
- model://: Access point for querying currently available Ollama architectures.
Each resource path is associated with requisite metadata and content type indicators crucial for optimal LLM interaction.
Conceptual Mapping: Prompts vs. Tools
Within this gateway's architecture, prompts and tools maintain distinct, yet interdependent, functions:
- Prompt (Schema Role): Dictates the required cognitive structure or input format for the LLM's inference process.
- Tool (Handler Role): Represents the executable function or system capability invoked by the protocol.
Every operative tool mandates an associated schema (prompt) to effectively bridge the LLM's reasoning capacity with tangible system actions.
Predefined Prompts (Schemas)
decompose-task- Purpose: To segment complex objectives into granular, executable steps.
- Input: Task description and level of detail (granularity).
-
Output: A structured breakdown detailing dependencies and estimated complexity.
-
evaluate-result - Purpose: To measure an artifact's quality against specified validation criteria.
- Input: The output artifact and evaluation parameters.
- Output: A graded assessment accompanied by actionable refinement suggestions.
Implemented Tools (Handlers)
add-task- Parameters:
name(string, required),description(string, required); optionalpriority(number),deadline(string),tags(array). -
Action: Registers a new assignment in the internal system, returning its unique ID.
-
decompose-task - Parameters:
task_id(string, required),granularity(enum: "high"|"medium"|"low", required); optionalmax_subtasks(number). -
Action: Leverages Ollama to segment the referenced complex task.
-
evaluate-result - Parameters:
result_id(string, required),criteria(object, required); optionaldetailed(boolean). -
Action: Executes the evaluation protocol on the specified output.
-
run-model - Parameters:
model(string, required),prompt(string, required); optionaltemperature(number),max_tokens(number). - Action: Executes a direct inference call on the specified Ollama backend.
Operational Enhancements
Advanced Error Reporting
The gateway yields enriched, structured error payloads, enabling client applications to implement precise recovery logic. Example structure for failure notification:
{ "error": { "message": "Assignment identifier not located: task-123", "status_code": 404, "details": { "provided_id": "task-123" } } }
Performance Tuning Parameters
Configuration via config.py allows fine-grained control over throughput:
python
Performance Configuration Settings
cache_size: int = 100 # Maximum entries retained in response cache max_connections: int = 10 # Global limit for concurrent outbound HTTP sessions max_connections_per_host: int = 10 # Per-origin limit for concurrent sessions request_timeout: int = 60 # Session time-out threshold in seconds
Model Specification Flexibility
Resolution Hierarchy
The active LLM architecture is determined via a strict priority mechanism:
- Explicit parameter within a Tool invocation (
modelargument). - Configuration entry within the MCP configuration file's
envsegment. - System Environment Variable (
OLLAMA_DEFAULT_MODEL). - Hardcoded fallback (
llama3).
Configuration File Injection
For integrated environments (e.g., Claude Desktop), model specification can be injected via the client's MCP configuration JSON:
{ "mcpServers": { "ollama-MCP-server": { "command": "python", "args": [ "-m", "ollama_mcp_server" ], "env": [ {"model": "llama3:latest"} ] } } }
Validation and Discovery
Upon startup, the gateway validates the existence of configured models, logging warnings if any are absent. The run-model tool can be queried to return a current manifest of discoverable models, aiding users in selecting valid targets.
Validation Protocol
The repository includes a comprehensive testing suite covering: - Unit Tests: Verification of isolated functional units. - Integration Tests: End-to-end workflow simulation.
Execution commands:
bash
Full Test Execution
python -m unittest discover
Targeted Integration Test Execution
python -m unittest tests.test_integration
Configuration Directives
Environment Variables
| Variable | Default | Description |
|---|---|---|
OLLAMA_HOST |
http://localhost:11434 |
Base URL for the Ollama API service. |
DEFAULT_MODEL |
llama3 |
Fallback model name for direct calls. |
LOG_LEVEL |
info |
Verbosity setting for operational logging. |
Ollama Prerequisites
Ensure Ollama runtime is installed and necessary models are pre-fetched:
bash
Install Ollama if missing
curl -fsSL https://ollama.com/install.sh | sh
Download primary models
ollama pull llama3 ollama pull mistral
Deployment Guide (Quick Start)
Installation
bash pip install ollama-mcp-server
Client Configuration Paths (e.g., Claude Desktop)
- macOS:
~/Library/Application\ Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%/Claude/claude_desktop_config.json
Non-Released/Development Server Registration
"mcpServers": { "ollama-MCP-server": { "command": "uv", "args": [ "--directory", "/path/to/ollama-MCP-server", "run", "ollama-MCP-server" ], "ENV":[{"model":"deepseek:r14B"}] } }Production Server Registration
"mcpServers": { "ollama-MCP-server": { "command": "uvx", "args": [ "ollama-MCP-server" ] } }Usage Examples
Task Decomposition Invocation
To partition a complex requirement:
python result = await mcp.use_mcp_tool({ "server_name": "ollama-MCP-server", "tool_name": "decompose-task", "arguments": { "task_id": "task://123", "granularity": "medium", "max_subtasks": 5 } })
Result Evaluation Call
To subject an output to quality assessment:
python evaluation = await mcp.use_mcp_tool({ "server_name": "ollama-MCP-server", "tool_name": "evaluate-result", "arguments": { "result_id": "result://456", "criteria": { "accuracy": 0.4, "completeness": 0.3, "clarity": 0.3 }, "detailed": true } })
Direct Model Inference
For direct queries against an Ollama backend:
python response = await mcp.use_mcp_tool({ "server_name": "ollama-MCP-server", "tool_name": "run-model", "arguments": { "model": "llama3", "prompt": "Explain quantum computing in layman's terms", "temperature": 0.7 } })
Development Workflow
Project Initialization
-
Repository acquisition: bash git clone https://github.com/yourusername/ollama-MCP-server.git cd ollama-MCP-server
-
Virtual environment setup and activation: bash python -m venv venv source venv/bin/activate # Or venv\Scripts\activate on Windows
-
Install development dependencies (assuming
uvis the package manager): bash uv sync --dev --all-extras
Local Execution Scripts
-
Server Launch: bash ./run_server.sh # Options: --debug, --log=LEVEL
-
Testing Execution: bash ./run_tests.sh # Options: --unit, --integration, --all (default), --verbose
Packaging and Distribution
-
Dependency synchronization: bash uv sync
-
Artifact creation: bash uv build
Output in dist/ directory (sdist and wheel)
- Publication to PyPI (requires configured credentials): bash uv publish
Debugging
Since the MCP server often operates via stdio pipes, direct debugging can be challenging. We highly recommend utilizing the official MCP Inspector for the best debugging experience.
To initiate the Inspector using npm:
bash npx @modelcontextprotocol/inspector uv --directory /path/to/ollama-MCP-server run ollama-mcp-server
The Inspector will subsequently output a URL accessible via a web browser for interactive debugging.
Architecture Overview
(Placeholder for detailed architectural diagram/description)
Collaboration Guidelines
Contributions are actively encouraged! Initiate development by submitting a Pull Request (PR).
- Fork the repository.
- Create a feature branch (
git checkout -b feat/new-capability). - Commit changes clearly (
git commit -m 'Feat: Implemented widget serialization'). - Push the branch (
git push origin feat/new-capability). - Open a formal PR.
Licensing
This software is distributed under the MIT License. Refer to the LICENSE file for comprehensive terms.
Acknowledgements
- The Model Context Protocol team for furnishing the foundational protocol specification.
- The Ollama project for democratizing accessible local LLM execution.
- All developers who have contributed to this gateway.
