So I built Videolyti over a few months. You paste a URL from YouTube, TikTok, Instagram, Twitter, Facebook, Reddit, or Vimeo — it gives you the video file and a text transcript.
The transcription runs OpenAI Whisper (large-v3) on my own server. No API calls to OpenAI, no per-minute billing. It handles 90+ languages and does a surprisingly good job with mixed-language audio (I test it regularly with Ukrainian-English conversations).
Tech details for those curious: - Frontend: Next.js (App Router, server components) - Backend: Express + Socket.IO for real-time progress - Downloads: yt-dlp + FFmpeg - Transcription: Whisper large-v3, running locally - The TikTok pipeline is a bit different — uses TikWM API first, falls back to yt-dlp
The 5 downloads/day limit is a practical thing — Whisper on CPU takes real compute time and I'm paying for the server out of pocket. Not a growth hack.
Feedback welcome on the UX. I know the mobile experience could be better (30% of traffic is mobile right now). Would also love input on what subtitle export formats would be most useful — SRT, VTT, or plain text with timestamps.
Source code isn't open yet but I'm considering it. Happy to discuss the architecture.