youtube-caption-extractor-tool
A utility to pull subtitle/caption data from specified YouTube video links. It accommodates various URL syntaxes and outputs the textual content as a structured array.
Author

highlight-ing
Quick Info
Actions
Tags
YouTube Media Integration Module
The YouTube MCP module facilitates the retrieval of transcribed content directly from video assets hosted on YouTube.
Operational Interfaces
retrieve_video_captions
Fetches the associated caption text for a given YouTube video identifier.
Arguments Required:
- sourceUrl: The complete Uniform Resource Locator for the YouTube content (accepts standard links, short domains, and embedded player paths).
Output Structure:
- A JSON Object containing:
- data_array: A collection holding the segmented transcript lines.
Supported Link Schemas
Processing logic recognizes several standard YouTube addressing schemes:
- Canonical: https://www.youtube.com/watch?v=IDENTIFIER
- Shortened: https://youtu.be/IDENTIFIER
- Embedding: https://www.youtube.com/embed/IDENTIFIER
Exception Management
Robust error handling is implemented:
- Malformed or nonexistent URLs trigger ErrorCode.InvalidParams
- Omission of the URL argument results in ErrorCode.InvalidParams
- Failures during external data retrieval yield descriptive error messages
- Ensures clean termination upon receipt of SIGINT signal
Implementation Specifics
- Developed leveraging the Highlight AI MCP Software Development Kit
- Dependent upon the 'youtube-transcript' underlying library
- Input schema validation is enforced using Zod
- Operates as a standard input/output communication server process
- Requires a runtime environment of Node.js version 18.0.0 or newer
Constraints
- Functionality is contingent upon captions being actively enabled for the specific video.
- Currently restricted to extracting transcripts in the English language only.
- Throttling limits are governed by YouTube's public access policies.
