I have an old project that relies on AWS transcription and I'd love to migrate it to something local.
- Converts to 16kHz WAV
- Transcribes using native ggerganov whisper
- Calls out to a local LLM to clean the text
- Prints out the final cleaned up transcription
I found that accuracy/success increased significantly when I added the LLM post-processor even with modestly sized 12-14b models.
I've been using it with great success to convert very old dictated memos from over a decade ago despite a lot of background noise (wind, traffic, etc).
[1] https://gist.github.com/scpedicini/455409fe7656d3cca8959c123...
I'm sure there are use cases where using Whisper directly is better, but it's a great addition to an already versatile tool.
drewbuschhorn•43m ago
Pavlinbg•38m ago