frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS

https://github.com/matthartman/ghost-pepper
91•MattHart88•1h ago
I built this because I wanted to see how far I could get with a voice-to-text app that used 100% local models so no data left my computer. I've been using a ton for coding and emails. Experimenting with using it as a voice interface for my other agents too. 100% open-source MIT license, would love feedback, PRs, and ideas on where to take it.

Comments

charlietran•1h ago
Thank you for sharing, I appreciate the emphasis on local speed and privacy. As a current user of Hex (https://github.com/kitlangton/Hex), which has similar goals, what are your thoughts on how they compare?
ipsum2•1h ago
Parakeet is significantly more accurate and faster than Whisper if it supports your language.
yeutterg•1h ago
Are you running Parakeet with VoiceInk[0]?

[0]: https://github.com/beingpax/VoiceInk

zackify•42m ago
i am, working great for a long time now
treetalker•1h ago
I have been using Parakeet with MacWhisper's hold-to-talk on a MacBook Neo and it's been awesome.
rahimnathwani•1h ago
Right, and if you're on MacOS you can use it for free with Hex: https://github.com/kitlangton/Hex
obrajesse•17m ago
And indeed, Ghost Pepper supports parakeet v3
goodroot•1h ago
Nice one! For Linux folks, I developed https://github.com/goodroot/hyprwhspr.

On Linux, there's access to the latest Cohere Transcribe model and it works very, very well. Requires a GPU though. Larger local models generally shouldn't require a subordinate model for clean up.

Have you compared WhisperKit to faster-whisper or similar? You might be able to run turbov3 successfully and negate the need for cleanup.

Incidentally, waiting for Apple to blow this all up with native STT any day now. :)

hephaes7us•1h ago
Thanks for sharing! I was literally getting ready to build, essentially, this. Now it looks like I don't have to!

Have you ever considered using a foot-pedal for PTT?

Apple incidentally already has native STT, but for some reason they just don't use a decent model yet.

goodroot•1h ago
They do, and they even have that nice microphone F5 key for it, and an ideal OS level API making the input experience >perfect<.

Apparently they do have a better model, they just haven't exposed it in their own OS yet!

https://developer.apple.com/documentation/speech/bringing-ad...

Wonder what's the hold up...

For footpedal:

Yes, conceptually it’s just another evdev-trigger source, assuming the pedal exposes usable key/button events.

Otherwise we’d bridge it into the existing external control interface. Either way, hooks are there. :)

jiehong•31m ago
The only issue with Apple models is that they do not detect languages automatically, nor switch if you do between sentences.

Parakeet does both just fine.

chrisweekly•3m ago
sorry, PTT?
LuxBennu•1h ago
I've been running whisper large-v3 on an m2 max through a self-hosted endpoint and honestly the accuracy is good enough that i stopped bothering with cleanup models. The bigger annoyance for me was latency on longer chunks, like anything over 30 seconds starts feeling sluggish even with metal acceleration. Haven't tried whisperkit specifically but curious how it handles longer audio compared to the full model.
goodroot•42m ago
Ah yeah, longform is interesting.

Not sure how you're running it, via whichever "app thing", but...

On resource limited machines: "Continuous recording" mode outputs when silence is detected via a configurable threshold.

This outputs as you speak in more reasonable chunks; in aggregate "the same output" just chunked efficiently.

Maybe you can try hackin' that up?

LuxBennu•2m ago
Yeah that makes sense, chunking on silence would sidestep the latency issue pretty cleanly. I've been running it through a basic fastapi wrapper so it just takes whatever audio blob gets thrown at it, no chunking logic on the server side. Might be worth adding a vad pass before sending to whisper though, would cut down on processing dead air too.
konaraddi•1h ago
That’s awesome! Do you know how it compares to Handy? Handy is open source and local only too. It’s been around a while and what I’ve been using.

https://github.com/cjpais/handy

youniverse•56m ago
I love and have been using handy for a while too, what we need is this for mobile apps I don't think there's any free apps and native dictation is not always fully local and not as good.
swaptr•54m ago
Handy is awesome! I used it for quite a while before Claude Code added voice support. Solid software, very good linux and mac integration. Shoutout to Parakeet models as well, extremely fast and solid models for their relatively modest memory requirements.
stavros•41m ago
Handy is fantastic.
mathis•1h ago
If you don't feel like downloading a large model, you can also use `yap dictate`. Yap leverages the built-in models exposed though Speech.framework on macOS 26 (Tahoe).

Project repo: https://github.com/finnvoor/yap

hyperhello•54m ago
Feature request or beg: let me play a speech video and transcribe it for me.
MattHart88•43m ago
I like this idea and it should work -- whatever microphone you have on should be able to hear the speaker. LMK if not (e.g., are you wearing headphones? if so, the mic can't hear the speaker)
aristech•52m ago
Great job. How about the supported languages? System languages gets recognised?
MattHart88•42m ago
Thanks! We currently have 2 multi-lingual options available: - Whisper small (multilingual) (~466 MB, supports many languages) - Parakeet v3 (25 languages) (~1.4 GB, supports 25 languages via FluidAudio)
lostathome•50m ago
If anyone interested, I built Hitoku Draft. It is a context aware voice assistant. Local models only.

Here is an example https://www.youtube.com/watch?v=Dw_q6l3Cwp4

I was mainly motivated by papers like this https://arxiv.org/pdf/2602.16800. But I found myself using it during vacation when I did not have internet connection.

https://hitoku.me/draft/

I setup a code for people to download it (HITOKUHN2026), in case you want to compare, or just give feedback!

guzik•49m ago
Sadly the app doesn't work. There is no popup asking for microphone permission.

EDIT: I see there is an open issue for that on github

ttul•32m ago
And many people are mailing in Codex and Claude Code generated PRs - myself included. Fingers crossed, I suppose.
parhamn•46m ago
I see a lot of whisper stuff out there. Are these updated models are the same old OpenAI whispers or have they been updated heavily?

I've been using parakeet v3 which is fantastic (and tiny). Confused still seeing whisper out there.

zackify•40m ago
same, even have kokoro for speech back to text for home assistant and parakeet on mac os through voice ink.

Also vibe coded a way to use parakeet from the same parakeet piper server on my grapheneos phone https://zach.codes/p/vibe-coding-a-wispr-clone-in-20-minutes

daemonologist•29m ago
Whisper is still old reliable - I find that it's less prone to hallucinations than newer models, easier to run (on AMD GPU, via whisper.cpp), and only ~2x slower than parakeet. I even bothered to "port" Parakeet to Nemo-less pytorch to run it on my GPU, and still went back to Whisper after a couple of days.
goodroot•8m ago
Whisper is very good in many languages.

It's also in many flavours, from tiny to turbo, and so can fit many system profiles.

That's what makes it unique and hard to replace.

gegtik•34m ago
how does this compare to macos built in siri TTS, in quality and in privacy?
realityfactchex•13m ago
Exactly my question. I double-tap the control button and macOS does native, local TTS dictation pretty well. (Similar to Keyboard > Enable Dictation setting on iOS.)

The macOS built-in TTS (dictation) seem better than all the 3rd party, local apps I tried in the past that people raved about. I have tried several.

Is this better somehow?

If the 3rd party apps did streaming with typing in place and corrections within a reasonable window when they understand things better given more context, that would be cool. Theoretically, a custom model or UX could be "better" than what comes free built into macOS (more accurate or customizable).

But when I contacted the developer of my favorite one they said that would be pretty hard to implement due to having to go back and make corrections in the active field, etc.

I assume streaming STT in these utilities for Mac will get better at some point, but I haven't seen it yet (been waiting). It seems these tools generally are not streaming, e.g. they want you to finish speaking first before showing you anything. Which doesn't work for me when I'm dictating. I want to see what I've been saying lately, to jog my memory about what I've just said and help guide the next thing I'm about to say. I certainly don't want to split my attention by running a touch-to-speak on and off to manually jog the control to say "ok, you can render what I just said now".

I guess "hold to dictate" tools are for delivering discrete, fully formed messages, not for longer, running dictation.

AFAICT, TFA is focused on hold-to-talk as the differentiator, over double-tap to begin speaking and double-tap to end speaking?

Supercompressor•33m ago
I've been looking for the opposite - wanting to dump text and it be read to me, coherently. Anyone have good recommendations?
realityfactchex•10m ago
Sure, Chatterbox TTS Server is rather high quality: https://github.com/devnen/Chatterbox-TTS-Server

You could hook it up to some workflow over the local API depending on how you want to dump the text, but the web UI is good too.

The Show HN by the author was at: https://news.ycombinator.com/item?id=44145564

ericmcer•31m ago
I see quite a few of these, the killer feature to me will be one that fine tunes the model based on your own voice.

E.G. if your name is `Donold` (pronounced like Donald) there is not a transcription model in existence that will transcribe your name correctly. That means forget inputting your name or email ever, it will never output it correctly.

Combine that with any subtleties of speech you have, or industry jargon you frequently use and you will have a much more useful tool.

We have a ton of options for "predict the most common word that matches this audio data" but I haven't found any "predict MY most common word" setups.

MattHart88•29m ago
I've found the "corrections" feature works well for most of the jargon and misspelling use cases. Can you give it a try and let me know edge cases?
sorenjan•23m ago
Whisper supports a prompt, you can put your "Donold" there.

https://developers.openai.com/cookbook/examples/whisper_prom...

__mharrison__•15m ago
Cool, I've been doing a lot of "coding" (and other typing tasks) recently by tapping a button on my Stream Deck. It starts recording me until I tap it again. At which point, it transcribes the recording and plops it into the paste buffer.

The button next to it pastes when I press it. If I press it again, it hits the enter command.

You can get a lot done with two buttons.

purplehat_•14m ago
Hi Matt, there's lots of speech-to-text programs out there with varying levels of quality. 100% local is admirable but it's always a tradeoff and users have to decide for themselves what's worth it.

Would you consider making available a video showing someone using the app?

semiquaver•1m ago
Slop

An In-Depth Study of Filter-Agnostic Vector Search on PostgreSQL

https://arxiv.org/abs/2603.23710
1•tanelpoder•49s ago•0 comments

Ormah: A living memory layer for your agents. Local. Private. Portable

https://www.ormah.me/
1•r_spade•3m ago•0 comments

Why Are People Injecting Themselves with Peptides?

https://www.newyorker.com/magazine/2026/04/13/why-are-people-injecting-themselves-with-peptides
1•bookofjoe•3m ago•1 comments

Show HN: Luminus, real-time European and UK electricity grid data via MCP

https://github.com/kitfunso/luminus
1•kitfunso•4m ago•0 comments

The Aesthetics of the Japanese Lunchbox by Kenji Ekuan (Book)

https://mitpress.mit.edu/9780262550352/the-aesthetics-of-the-japanese-lunchbox/
1•xbar•11m ago•1 comments

Bitcoin developers are mostly not concerned about quantum risk

https://murmurationstwo.substack.com/p/bitcoin-developers-are-mostly-not
1•greyface-•12m ago•0 comments

Follow-up from Anthropic on usage limits

https://old.reddit.com/r/ClaudeAI/comments/1sat07y/followup_on_usage_limits/
1•Austin_Conlon•12m ago•0 comments

Ask HN: How do you handle marketing as a solo technical founder?

3•lazarkap•12m ago•1 comments

Show HN: Bypass Instagram/TikTok in-app browsers that silently kill conversions

https://nullmark.tech/
1•melvinmorina•13m ago•0 comments

As an autistic person, Claude is the friend I always wanted but never had

https://old.reddit.com/r/ClaudeAI/comments/1sdq4eu/as_an_autistic_person_claude_is_the_friend_i/
5•mjtk•16m ago•1 comments

NYC Families Need over $125,000 in Income to Live in Any Borough

https://www.bloomberg.com/news/articles/2026-04-06/nyc-families-need-over-125-000-in-income-to-li...
5•boh•16m ago•1 comments

Using Discord on Plan 9

https://pmikkelsen.com/plan9/discord
1•birdculture•20m ago•0 comments

25th Amendment – A Viral Politics Game

https://games-by-will.com/25th-amendment/
2•wjuseck•23m ago•0 comments

True Queue – Task queue extension for Pi coding agent

https://github.com/Krystofee/true-queue
1•krystofee•26m ago•1 comments

Show HN: Compare Codex and Claude Code reviews side by side

https://twitter.com/plannotator/status/2041264274228781520
2•ramoz•27m ago•0 comments

Derek Lowe on "Peptides"

https://www.science.org/content/blog-post/ah-peptides-where-begin
5•A_D_E_P_T•27m ago•2 comments

View Transitions Toolkit

https://chrome.dev/view-transitions-toolkit/
1•aragonite•31m ago•0 comments

Gemma 4 clocking 12 tokens / SEC on Pixel 7A

https://twitter.com/1littlecoder/status/2040830792306425981
1•amrrs•31m ago•0 comments

Rust CLI Toy

https://github.com/smbcloudXYZ/smbcloud-cli
1•kampak212•32m ago•0 comments

AI Is Not Draining the Colorado River

https://www.outsideonline.com/outdoor-adventure/environment/ai-water-use-colorado-river-footprint/
2•greenie_beans•33m ago•0 comments

Artist Alexis Rockman Traces the Unsettling Evolution of a Climate in Crisis

https://www.thisiscolossal.com/2025/12/alexis-rockman-feedback-loop-climate-paintings/
2•akkartik•33m ago•0 comments

Klore – I compiled 3 Alex Hormozi books into an interconnected wiki using LLMs

https://github.com/vbarsoum1/llm-wiki-compiler
1•vbarsoum•34m ago•0 comments

LLM on a 1998 iMac G3 (32 MB RAM)

https://github.com/maddiedreese/imac-llm
1•Philpax•34m ago•0 comments

The App Store Is Flooded with AI Slop, Legitimate Developers Are Paying for It

https://www.forbes.com/sites/josipamajic/2026/03/24/the-apple-app-store-is-flooded-with-ai-slop-a...
4•halcdev•35m ago•0 comments

Relation between Layoffs and AI tools subscriptions

https://www.reddit.com/r/Layoffs/s/QlUYUyjWuP
2•the_arun•38m ago•1 comments

Can AI answer tax questions reliably?

https://accountsdraft.com/resources/can-chatgpt-answer-uk-tax-questions-reliably
1•Rob_Benson-May•38m ago•0 comments

DRAM Prices Rise Again as Samsung Adds 30% Increase

https://www.eteknix.com/dram-prices-rise-again-as-samsung-adds-30-increase/
1•elorant•42m ago•1 comments

Talat's AI meeting notes stay on your machine, not in the cloud

https://techcrunch.com/2026/03/24/talats-ai-meeting-notes-stay-on-your-machine-not-in-the-cloud/
1•PaulHoule•43m ago•1 comments

Reasons Dumbphones Work

https://josebriones.substack.com/p/3-reasons-dumbphones-work
1•toomuchtodo•45m ago•1 comments

Block secrets before they enter LLM's Context with Agentmask

https://github.com/adithyan-ak/agentmask
1•akoffsec•46m ago•0 comments