TranscriptAPI is a lightweight wrapper that handles the extraction, formatting, and proxy rotation. It prioritizes manual captions over auto-generated ones and returns clean JSON with timestamps, ready for RAG pipelines.
The MCP (Model Context Protocol) Integration: I recently added native MCP support. If you use Claude Desktop or other MCP-compliant agents, you can add this API as a tool to 'watch' videos directly in your chat context without manually copying transcripts.
Technical Stack:
Backend: Python (FastAPI) on AWS Lambda (for burst scaling)
Caching: Redis (to prevent hitting YouTube for the same video twice)
Challenge: Handling 'drifting' timestamps in long livestreams where the auto-generated captions lose sync with the video frame.
It has a free tier for hobbyists. I’m curious to hear how you’re handling the context-window limits when feeding full 3-hour transcripts to LLMs