Whisper will hallucinate on audio segments that don't have any speech. VAD mitigates that. Expect worse results without it, especially on non-English audio.
mikece•1h ago
Is the point that you only need one tool -- ffmpeg -- to both generate transcripts as well as embed those into a video as opposed to having multiple tools?
pinter69•1h ago
This is a 3 part series, the first one discusses the new native whisper integration. And correct, for the first post - the point is to show that you can only use ffmpeg to transcribe and embed subtitles in a video
mikece•31m ago
While there's appeal in having one tool do several things I'm more a fan of the traditional UNIX philosophy that a tool should do one thing, do it extremely well, and allow for chaining of several tools together to achieve a multi-step process.
pinter69•29m ago
I tend to agree. The thing I like most about version 8 is actually pad_cuda - nice performance boost for resizing video with an Nvidia GPU
pinter69•1h ago
cranberryturkey•1h ago
lern_too_spel•16m ago