I built sub-tools to solve a problem I had: creating accurate, multilingual subtitles for video content without spending hours on manual transcription or paying for expensive services.
I started with a pure-LLM solution, letting Gemini generate SRT from the audio file. It was slow and not accurate, so I had to make a few tweaks, including splitting the audio into smaller chunks and validating the SRT and retrying if not valid. It was okay until I took the new approach.
v0.8.0 now uses a three-stage AI pipeline:
1. WhisperX for word-level aligned transcription
2. Google Gemini for proofreading and error correction
3. Gemini again for context-aware translation
I'm satisfied with the result. I'd love for you to try it out and hear what you think.
dohyeondk•26m ago
I started with a pure-LLM solution, letting Gemini generate SRT from the audio file. It was slow and not accurate, so I had to make a few tweaks, including splitting the audio into smaller chunks and validating the SRT and retrying if not valid. It was okay until I took the new approach.
v0.8.0 now uses a three-stage AI pipeline:
1. WhisperX for word-level aligned transcription
2. Google Gemini for proofreading and error correction
3. Gemini again for context-aware translation
I'm satisfied with the result. I'd love for you to try it out and hear what you think.