Whisper will hallucinate on audio segments that don't have any speech. VAD mitigates that. Expect worse results without it, especially on non-English audio.
trq01758•2mo ago
"Lenovo laptop with Nvidia RTX 4040" 4060?
pinter69•2mo ago
Correct. I fixed the typo
mikece•2mo ago
Is the point that you only need one tool -- ffmpeg -- to both generate transcripts as well as embed those into a video as opposed to having multiple tools?
pinter69•2mo ago
This is a 3 part series, the first one discusses the new native whisper integration. And correct, for the first post - the point is to show that you can only use ffmpeg to transcribe and embed subtitles in a video
mikece•2mo ago
While there's appeal in having one tool do several things I'm more a fan of the traditional UNIX philosophy that a tool should do one thing, do it extremely well, and allow for chaining of several tools together to achieve a multi-step process.
pinter69•2mo ago
I tend to agree. The thing I like most about version 8 is actually pad_cuda - nice performance boost for resizing video with an Nvidia GPU
radicality•2mo ago
Do you know if it’s supported on Mac too, with whatever platform specific optimizations like running it on the gpu / with MPS ?
pinter69•2mo ago
You mean Vulkan?
In the blog post there is reference to all vulkan supported platforms
If you mean ffmpeg build with whisper - from memory I didn't see ffmpeg-builds for mac, so you will probably need to compile yourself
pinter69•2mo ago
cranberryturkey•2mo ago
lern_too_spel•2mo ago
trq01758•2mo ago
pinter69•2mo ago