speech-synthesis-mcp-adapter
Facilitates the integration of advanced voice synthesis services into applications, transforming textual input into high-fidelity spoken audio. It offers granular control over vocal characteristics and output encoding, supplemented by a convenient terminal utility for immediate text-to-audio operations.
Author

nakamurau1
Quick Info
Actions
Tags
speech-synthesis-mcp-adapter
A Model Context Protocol (MCP) endpoint and command-line utility designed for premium speech rendering via the OpenAI TTS engine.
Core Capabilities
- MCP Endpoint: Enables seamless speech generation integration with clients adhering to the MCP standard, such as Claude Desktop.
- Vocal Selection: Comprehensive support for numerous speaker profiles (e.g., alloy, nova, echo, etc.).
- Superior Audio Quality: Output fidelity supports diverse encodings (MP3, WAV, OPUS, AAC).
- Parameter Tuning: Allows configuration of speech cadence, chosen voice persona, and contextual auditory directives.
- CLI Functionality: Also deployable as a direct-use command-line interface for on-the-fly text rendition.
Deployment Instructions
Route 1: Installation via Source Repository
# Obtain the source code
git clone https://github.com/nakamurau1/tts-mcp.git
cd tts-mcp
# Fetch required packages
npm install
# Optional: Install for system-wide access
npm install -g .
Route 2: Immediate Execution with npx (Zero Install)
# Launch the MCP server instantly
npx tts-mcp speech-synthesis-mcp-adapter --voice nova --model tts-1-hd
# Invoke the conversion utility directly
npx tts-mcp -t "Greetings from the adapter" -o greeting.mp3
MCP Server Operational Guide
This server bridges your text-to-speech requirements to MCP-compliant consumer applications (e.g., Claude Desktop).
Activating the MCP Server
# Start using default configurations
npm run server
# Start with specific overrides
npm run server -- --voice echo --model tts-1
# Or start directly specifying the API key
node bin/tts-mcp-server.js --voice fable --api-key sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxx
Server Configuration Switches
Options:
-V, --version Show current version tag
-m, --model <model> TTS engine to utilize (default: "gpt-4o-mini-tts")
-v, --voice <voice> Speaker identity to employ (default: "alloy")
-f, --format <format> Output audio container type (default: "mp3")
--api-key <key> OpenAI credential; can be sourced from an environment variable
-h, --help Display usage documentation
Interfacing with MCP Clients
The server is designed for consumption by Claude Desktop and other systems recognizing the MCP schema. For Claude Desktop setup:
- Locate the Claude Desktop configuration file (usually at
~/Library/Application Support/Claude/claude_desktop_config.json) - Inject the subsequent configuration block, ensuring your OpenAI API key is provided:
{
"mcpServers": {
"speech-synthesis-mcp-adapter": {
"command": "node",
"args": ["full/path/to/bin/tts-mcp-server.js", "--voice", "alloy", "--api-key", "your-openai-api-key"],
"env": {
"OPENAI_API_KEY": "your-openai-api-key"
}
}
}
}
Alternatively, using npx for simplified path management:
{
"mcpServers": {
"speech-synthesis-mcp-adapter": {
"command": "npx",
"args": ["-p", "tts-mcp", "tts-mcp-server", "--voice", "sage", "--model", "tts-1"],
"env": {
"OPENAI_API_KEY": "your-openai-api-key"
}
}
}
}
You can provision the API key via two primary methods:
- Direct Injection (suitable for prototyping): Include it in the
argsarray using the--api-keyflag. - Environment Variable Exposure (preferred for security): Set it within the
envstructure as illustrated.
Security Mandate: Exercise caution to safeguard your configuration file, especially when API secrets are embedded.
- Relaunch Claude Desktop.
- Text read requests issued to Claude (e.g., "vocalize this text") will now utilize the configured synthesis engine.
Registered MCP Functions
- text-to-speech: Primary function for rendering provided text into audible output.
Command-Line Interface Utility
speech-synthesis-mcp-adapter also functions autonomously as a terminal utility:
# Perform a direct text transcription
speech-synthesis-mcp-adapter -t "Testing the CLI utility now" -o output_test.mp3
# Transcribe content from a source file
speech-synthesis-mcp-adapter -f input_data.txt -o file_audio.mp3
# Select a specific vocal texture
speech-synthesis-mcp-adapter -t "A different voice sample" -o custom_voice.mp3 -v onyx
CLI Utility Switches
Options:
-V, --version Display version information
-t, --text <text> The textual content requiring vocalization
-f, --file <path> Location of the input text document
-o, --output <path> Destination path for the resulting audio file (mandatory)
-m, --model <n> The synthesis model identifier (default: "gpt-4o-mini-tts")
-v, --voice <n> The specific speaker profile ID (default: "alloy")
-s, --speed <number> Playback rate adjustment factor (range 0.25 to 4.0) (default: 1)
--format <format> Desired container format (default: "mp3")
-i, --instructions <text> Supplementary directives for speech modulation
--api-key <key> OpenAI credential; can be passed via environment variable
-h, --help Display summary help screen
Supported Vocal Profiles
The following speaker identities are recognized: - alloy (default) - ash - coral - echo - fable - onyx - nova - sage - shimmer
Supported Models
- tts-1
- tts-1-hd
- gpt-4o-mini-tts (default)
Output Encoding Options
The achievable output media formats include: - mp3 (default) - opus - aac - flac - wav - pcm
Environment Configuration Variables
Configuration can also be managed through system environment settings:
OPENAI_API_KEY=your-secret-key-here
Licensing
MIT
WIKIPEDIA: XMLHttpRequest (XHR) is an API in the form of a JavaScript object whose methods transmit HTTP requests from a web browser to a web server. The methods allow a browser-based application to send requests to the server after page loading is complete, and receive information back. XMLHttpRequest is a component of Ajax programming. Prior to Ajax, hyperlinks and form submissions were the primary mechanisms for interacting with the server, often replacing the current page with another one.
== History == The concept behind XMLHttpRequest was conceived in 2000 by the developers of Microsoft Outlook. The concept was then implemented within the Internet Explorer 5 browser (1999). However, the original syntax did not use the XMLHttpRequest identifier. Instead, the developers used the identifiers ActiveXObject("Msxml2.XMLHTTP") and ActiveXObject("Microsoft.XMLHTTP"). As of Internet Explorer 7 (2006), all browsers support the XMLHttpRequest identifier. The XMLHttpRequest identifier is now the de facto standard in all the major browsers, including Mozilla's Gecko layout engine (2002), Safari 1.2 (2004) and Opera 8.0 (2005).
=== Standards === The World Wide Web Consortium (W3C) published a Working Draft specification for the XMLHttpRequest object on April 5, 2006. On February 25, 2008, the W3C published the Working Draft Level 2 specification. Level 2 added methods to monitor event progress, allow cross-site requests, and handle byte streams. At the end of 2011, the Level 2 specification was absorbed into the original specification. At the end of 2012, the WHATWG took over development and maintains a living document using Web IDL.
== Usage == Generally, sending a request with XMLHttpRequest has several programming steps.
Create an XMLHttpRequest object by calling a constructor: Call the "open" method to specify the request type, identify the relevant resource, and select synchronous or asynchronous operation: For an asynchronous request, set a listener that will be notified when the request's state changes: Initiate the request by calling the "send" method: Respond to state changes in the event listener. If the server sends response data, by default it is captured in the "responseText" property. When the object stops processing the response, it changes to state 4, the "done" state. Aside from these general steps, XMLHttpRequest has many options to control how the request is sent and how the response is processed. Custom header fields can be added to the request to indicate how the server should fulfill it, and data can be uploaded to the server by providing it in the "send" call. The response can be parsed from the JSON format into a readily usable JavaScript object, or processed gradually as it arrives rather than waiting for the entire text. The request can be aborted prematurely or set to fail if not completed in a specified amount of time.
== Cross-domain requests ==
In the early development of the World Wide Web, it was found possible to brea
