logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

youtube-caption-utility-mcp

Provides rapid, multilingual summarization and full timed-transcript extraction for YouTube media via integration with the DeepSRT processing backend. Optimized for consumption within MCP environments.

Author

youtube-caption-utility-mcp logo

DeepSRT

No License

Quick Info

GitHub GitHub Stars 45
NPM Weekly Downloads 0
Tools 1
Last Updated 2026-02-19

Tags

deepsrtyoutubevideossummaries youtuberequests deepsrtvideos narrative

DeepSRT Utility Server for Video Content Analysis

This package operates as a Model Context Protocol (MCP) service endpoint, utilizing DeepSRT's infrastructure to process YouTube video captions, generate concise abstractive summaries, and extract structured, timestamped text records. It bypasses traditional API key requirements by accessing YouTube caption tracks directly via the InnerTube mechanism.

Core Features Overview

  • Video Abstraction: Generates summaries in specified formats (e.g., narrative, bulleted lists).
  • Transcript Retrieval: Outputs complete, time-coded transcripts from available caption sources.
  • Multilingual Support: Handles processing and output localization for various languages (e.g., en, zh-tw, ja).
  • Direct Access: Leverages built-in mechanisms to fetch captions directly from YouTube without external credentials.

MCP Configuration Snippet

To enable this service within an MCP-aware application (like Claude Desktop):

{
  "mcpServers": {
    "deepsrt": {
      "type": "stdio",
      "command": "bunx",
      "args": [
        "@deepsrt/deepsrt-mcp@latest",
        "--server"
      ]
    }
  }
}

System Blueprint

graph TD
    subgraph ClientLayer [MCP Consumer Interface]
        A[User Application (e.g., Cline/Desktop)]
    end

    subgraph ServerNode [Utility MCP Daemon]
        B[Request Handler]
        C[Summary Generator Tool]
        D[Transcript Fetcher Tool]

        B --> C
        B --> D
    end

    subgraph ExternalServices [Remote Resources]
        E[YouTube InnerTube API]
        F[DeepSRT Computation Cluster]
        G[YouTube Caption XML Source]
    end

    A --> B
    C --> E
    C --> F
    D --> E
    D --> G

Operational Sequence: Summarization

sequenceDiagram
    participant U as User
    participant C as Client
    participant S as Daemon
    participant Y as YouTube
    participant D as DeepSRT

    U->>C: Initiate summary request (URL/ID)
    C->>S: Call get_summary(params)

    S->>Y: Probe video ID via InnerTube (metadata/tracks)
    Y-->>S: Return manifest & caption listings

    S->>S: Select optimal caption stream (Prioritize human-authored)

    par Processing Path
        S->>D: Transmit data block for abstraction
        D-->>S: Deliver synthesized summary
    and Title Localization
        S->>D: Request title translation if necessary
        D-->>S: Return translated title string
    end

    S->>S: Apply required output formatting (Markdown)
    S-->>C: Respond with structured summary data
    C-->>U: Present result

Transcript Retrieval Flow

This path focuses on extracting raw timed text:

  1. Client Request: User requests transcript via get_transcript(videoId, lang).
  2. Metadata Fetch: Server queries YouTube InnerTube for video details and available caption tracks.
  3. Track Selection: The daemon identifies the highest quality caption source matching the requested language, favoring manually uploaded sets.
  4. XML Retrieval: The raw <timedtext> XML file is fetched from its CDN location.
  5. Local Parsing: The server parses the XML, decodes HTML entities, synchronizes timestamps into [MM:SS] format, and filters irrelevant markers.
  6. Output: Returns the finalized, time-aligned text block to the client.

Technical Highlights

Caption Selection Logic

Intelligent stream selection ensures data fidelity: 1. Manually Uploaded Captions (Highest fidelity). 2. Auto-Generated Captions (Fallback). 3. Any Available Stream (Lowest priority, if the preferred language is absent).

Data Transformation

Transcript data undergoes rigorous cleaning: * Conversion from milliseconds to readable time codes. * HTML entity resolution (e.g., &amp; becomes &). * Filtering of boilerplate text or audio descriptions (e.g., "(Music)").

Command Line Interface (CLI)

The utility package also exposes direct CLI functions when executed without the --server flag, allowing immediate interaction via bunx or globally installed binaries.

Example: Direct Summary Fetch

bunx @deepsrt/deepsrt-mcp get-summary <video_id> --lang=en --mode=bullet

Example: Direct Transcript Fetch

bunx @deepsrt/deepsrt-mcp get-transcript <video_url> --lang=fr

Version Log (Selected)

v0.1.9

  • Refined API interaction logic to improve resilience against temporary rate limits (HTTP 429 responses).
  • Significantly boosted stability of the end-to-end validation suite.

v0.1.3

  • Corrected behavior for the --mode bullet parameter.
  • Ensured seamless execution environment compatibility with the latest bunx toolchain.

Integration Methods

For Claude Desktop/Cline Users

Place the provided JSON configuration block into your respective application settings file. The system defaults to invoking the latest version of the package via bunx for automatic updates and dependency management.

For Standalone Node/Bun Execution

Development workflow favors Bun for speed:

# Install dependencies
npm install

# Start server in development mode (live reloading)
npm run dev

Frequently Encountered Issues (FAQ)

Q: Why might I receive a 404 error from the DeepSRT backend?

A: A 404 often indicates that the requested summary content has not yet been indexed or cached by the DeepSRT CDN edge network. To pre-warm the cache, users should access the video URL once using the associated DeepSRT browser extension. Verification can be done via curl, checking for cache-status: HIT.

See Also

`