We built OWhisper for 2 reasons: (Also outlined in https://docs.hyprnote.com/owhisper/what-is-this)
(1). While working with on-device, realtime speech-to-text, we found there isn't tooling that exists to download / run the model in a practical way.
(2). Also, we got frequent requests to provide a way to plug in custom STT endpoints to the Hyprnote desktop app, just like doing it with OpenAI-compatible LLM endpoints.
The (2) part is still kind of WIP, but we spent some time writing docs so you'll get a good idea of what it will look like if you skim through them.
For (1) - You can try it now. (https://docs.hyprnote.com/owhisper/cli/get-started)
bash
brew tap fastrepl/hyprnote && brew install owhisper
owhisper pull whisper-cpp-base-q8-en
owhisper run whisper-cpp-base-q8-en
If you're tired of Whisper, we also support Moonshine :)
Give it a shot (owhisper pull moonshine-onnx-base-q8)We're here and looking forward to your comments!
yujonglee•3h ago
These are list of local models it supports:
- whisper-cpp-base-q8
- whisper-cpp-base-q8-en
- whisper-cpp-tiny-q8
- whisper-cpp-tiny-q8-en
- whisper-cpp-small-q8
- whisper-cpp-small-q8-en
- whisper-cpp-large-turbo-q8
- moonshine-onnx-tiny
- moonshine-onnx-tiny-q4
- moonshine-onnx-tiny-q8
- moonshine-onnx-base
- moonshine-onnx-base-q4
- moonshine-onnx-base-q8
phkahler•1h ago
To me, STT should take a continuous audio stream and output a continuous text stream.
yujonglee•1h ago
Whisper and Moonshine both works in a chunk, but for moonshine:
> Moonshine's compute requirements scale with the length of input audio. This means that shorter input audio is processed faster, unlike existing Whisper models that process everything as 30-second chunks. To give you an idea of the benefits: Moonshine processes 10-second audio segments 5x faster than Whisper while maintaining the same (or better!) WER.
Also for kyutai, we can input continuous audio in and get continuous text out.
- https://github.com/moonshine-ai/moonshine - https://docs.hyprnote.com/owhisper/configuration/providers/k...
mijoharas•1h ago
(maybe with an `owhisper serve` somewhere else to start the model running or whatever.)
yujonglee•1h ago
For just transcribing file/audio,
`owhisper run <MODEL> --file a.wav` or
`curl httpsL//something.com/audio.wav | owhisper run <MODEL>`
might makes sense.
mijoharas•1h ago
yujonglee•1h ago
https://github.com/fastrepl/hyprnote/blob/8bc7a5eeae0fe58625...
alkh•42m ago
yujonglee•39m ago