speech-synthesis-mcp-adapter

A Model Context Protocol (MCP) endpoint and command-line utility designed for premium speech rendering via the OpenAI TTS engine.

Core Capabilities

MCP Endpoint: Enables seamless speech generation integration with clients adhering to the MCP standard, such as Claude Desktop.
Vocal Selection: Comprehensive support for numerous speaker profiles (e.g., alloy, nova, echo, etc.).
Superior Audio Quality: Output fidelity supports diverse encodings (MP3, WAV, OPUS, AAC).
Parameter Tuning: Allows configuration of speech cadence, chosen voice persona, and contextual auditory directives.
CLI Functionality: Also deployable as a direct-use command-line interface for on-the-fly text rendition.

Deployment Instructions

Route 1: Installation via Source Repository

# Obtain the source code
git clone https://github.com/nakamurau1/tts-mcp.git
cd tts-mcp

# Fetch required packages
npm install

# Optional: Install for system-wide access
npm install -g .

Route 2: Immediate Execution with npx (Zero Install)

# Launch the MCP server instantly
npx tts-mcp speech-synthesis-mcp-adapter --voice nova --model tts-1-hd

# Invoke the conversion utility directly
npx tts-mcp -t "Greetings from the adapter" -o greeting.mp3

MCP Server Operational Guide

This server bridges your text-to-speech requirements to MCP-compliant consumer applications (e.g., Claude Desktop).

Activating the MCP Server

# Start using default configurations
npm run server

# Start with specific overrides
npm run server -- --voice echo --model tts-1

# Or start directly specifying the API key
node bin/tts-mcp-server.js --voice fable --api-key sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxx

Server Configuration Switches

Options:
  -V, --version       Show current version tag
  -m, --model <model> TTS engine to utilize (default: "gpt-4o-mini-tts")
  -v, --voice <voice> Speaker identity to employ (default: "alloy")
  -f, --format <format> Output audio container type (default: "mp3")
  --api-key <key>     OpenAI credential; can be sourced from an environment variable
  -h, --help          Display usage documentation

Interfacing with MCP Clients

The server is designed for consumption by Claude Desktop and other systems recognizing the MCP schema. For Claude Desktop setup:

Locate the Claude Desktop configuration file (usually at ~/Library/Application Support/Claude/claude_desktop_config.json)
Inject the subsequent configuration block, ensuring your OpenAI API key is provided:

{
  "mcpServers": {
    "speech-synthesis-mcp-adapter": {
      "command": "node",
      "args": ["full/path/to/bin/tts-mcp-server.js", "--voice", "alloy", "--api-key", "your-openai-api-key"],
      "env": {
        "OPENAI_API_KEY": "your-openai-api-key"
      }
    }
  }
}

Alternatively, using npx for simplified path management:

{
  "mcpServers": {
    "speech-synthesis-mcp-adapter": {
      "command": "npx",
      "args": ["-p", "tts-mcp", "tts-mcp-server", "--voice", "sage", "--model", "tts-1"],
      "env": {
        "OPENAI_API_KEY": "your-openai-api-key"
      }
    }
  }
}

You can provision the API key via two primary methods:

Direct Injection (suitable for prototyping): Include it in the args array using the --api-key flag.
Environment Variable Exposure (preferred for security): Set it within the env structure as illustrated.

Security Mandate: Exercise caution to safeguard your configuration file, especially when API secrets are embedded.

Relaunch Claude Desktop.
Text read requests issued to Claude (e.g., "vocalize this text") will now utilize the configured synthesis engine.

Registered MCP Functions

text-to-speech: Primary function for rendering provided text into audible output.

Command-Line Interface Utility

speech-synthesis-mcp-adapter also functions autonomously as a terminal utility:

# Perform a direct text transcription
speech-synthesis-mcp-adapter -t "Testing the CLI utility now" -o output_test.mp3

# Transcribe content from a source file
speech-synthesis-mcp-adapter -f input_data.txt -o file_audio.mp3

# Select a specific vocal texture
speech-synthesis-mcp-adapter -t "A different voice sample" -o custom_voice.mp3 -v onyx

CLI Utility Switches

Options:
  -V, --version           Display version information
  -t, --text <text>       The textual content requiring vocalization
  -f, --file <path>       Location of the input text document
  -o, --output <path>     Destination path for the resulting audio file (mandatory)
  -m, --model <n>         The synthesis model identifier (default: "gpt-4o-mini-tts")
  -v, --voice <n>         The specific speaker profile ID (default: "alloy")
  -s, --speed <number>    Playback rate adjustment factor (range 0.25 to 4.0) (default: 1)
  --format <format>       Desired container format (default: "mp3")
  -i, --instructions <text> Supplementary directives for speech modulation
  --api-key <key>         OpenAI credential; can be passed via environment variable
  -h, --help              Display summary help screen

Supported Vocal Profiles

The following speaker identities are recognized: - alloy (default) - ash - coral - echo - fable - onyx - nova - sage - shimmer

Supported Models

tts-1
tts-1-hd
gpt-4o-mini-tts (default)

Output Encoding Options

The achievable output media formats include: - mp3 (default) - opus - aac - flac - wav - pcm

Environment Configuration Variables

Configuration can also be managed through system environment settings:

OPENAI_API_KEY=your-secret-key-here

Licensing

MIT

WIKIPEDIA: XMLHttpRequest (XHR) is an API in the form of a JavaScript object whose methods transmit HTTP requests from a web browser to a web server. The methods allow a browser-based application to send requests to the server after page loading is complete, and receive information back. XMLHttpRequest is a component of Ajax programming. Prior to Ajax, hyperlinks and form submissions were the primary mechanisms for interacting with the server, often replacing the current page with another one.

== History == The concept behind XMLHttpRequest was conceived in 2000 by the developers of Microsoft Outlook. The concept was then implemented within the Internet Explorer 5 browser (1999). However, the original syntax did not use the XMLHttpRequest identifier. Instead, the developers used the identifiers ActiveXObject("Msxml2.XMLHTTP") and ActiveXObject("Microsoft.XMLHTTP"). As of Internet Explorer 7 (2006), all browsers support the XMLHttpRequest identifier. The XMLHttpRequest identifier is now the de facto standard in all the major browsers, including Mozilla's Gecko layout engine (2002), Safari 1.2 (2004) and Opera 8.0 (2005).

=== Standards === The World Wide Web Consortium (W3C) published a Working Draft specification for the XMLHttpRequest object on April 5, 2006. On February 25, 2008, the W3C published the Working Draft Level 2 specification. Level 2 added methods to monitor event progress, allow cross-site requests, and handle byte streams. At the end of 2011, the Level 2 specification was absorbed into the original specification. At the end of 2012, the WHATWG took over development and maintains a living document using Web IDL.

== Usage == Generally, sending a request with XMLHttpRequest has several programming steps.

Create an XMLHttpRequest object by calling a constructor: Call the "open" method to specify the request type, identify the relevant resource, and select synchronous or asynchronous operation: For an asynchronous request, set a listener that will be notified when the request's state changes: Initiate the request by calling the "send" method: Respond to state changes in the event listener. If the server sends response data, by default it is captured in the "responseText" property. When the object stops processing the response, it changes to state 4, the "done" state. Aside from these general steps, XMLHttpRequest has many options to control how the request is sent and how the response is processed. Custom header fields can be added to the request to indicate how the server should fulfill it, and data can be uploaded to the server by providing it in the "send" call. The response can be parsed from the JSON format into a readily usable JavaScript object, or processed gradually as it arrives rather than waiting for the entire text. The request can be aborted prematurely or set to fail if not completed in a specified amount of time.

== Cross-domain requests ==

In the early development of the World Wide Web, it was found possible to brea

speech-synthesis-mcp-adapter

Author

nakamurau1

Quick Info

Actions

Tags

speech-synthesis-mcp-adapter

Core Capabilities

Deployment Instructions

Route 1: Installation via Source Repository

Route 2: Immediate Execution with npx (Zero Install)

MCP Server Operational Guide

Activating the MCP Server

Server Configuration Switches

Interfacing with MCP Clients

Registered MCP Functions

Command-Line Interface Utility

CLI Utility Switches

Supported Vocal Profiles

Supported Models

Output Encoding Options

Environment Configuration Variables

Licensing

See Also