I like karaoke and I grew up with the Asian style karaoke with the music video behind and the karaoke lyrics at the bottom.
Sometimes I want to do a song and there is no karaoke version video like that.
A few years ago I came across ML models that cleanly separate the vocals and the instrumental music of a song. I thought of the idea to chain together ML models that can take an input music video file, extract the audio (ffmpeg), separate the tracks (ML), transcribe the lyrics (ML), burn the lyrics back with timing into the video (ffmpeg), and output a karaoke version of the video.
This is an early version of the app, Mac only so far (since I use Mac, despite it being an electron app.. I do eventually want to make a Windows build), I've only let a few friends try it. Let me know what you think!