frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

France's homegrown open source online office suite

https://github.com/suitenumerique
50•nar001•1h ago•27 comments

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
320•theblazehen•2d ago•106 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
43•AlexeyBrin•2h ago•8 comments

Reinforcement Learning from Human Feedback

https://arxiv.org/abs/2504.12501
23•onurkanbkrc•1h ago•1 comments

Software Engineering Is Back

https://blog.alaindichiappari.dev/p/software-engineering-is-back
51•alainrk•1h ago•47 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
725•klaussilveira•16h ago•224 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
986•xnx•22h ago•562 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
109•jesperordrup•7h ago•41 comments

Ga68, a GNU Algol 68 Compiler

https://fosdem.org/2026/schedule/event/PEXRTN-ga68-intro/
22•matt_d•3d ago•4 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
79•videotopia•4d ago•12 comments

Making geo joins faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
143•matheusalmeida•2d ago•37 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
245•isitcontent•17h ago•27 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
252•dmpetrov•17h ago•129 comments

Cross-Region MSK Replication: K2K vs. MirrorMaker2

https://medium.com/lensesio/cross-region-msk-replication-a-comprehensive-performance-comparison-o...
5•andmarios•4d ago•1 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
347•vecti•19h ago•153 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
514•todsacerdoti•1d ago•249 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
397•ostacke•23h ago•102 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
49•helloplanets•4d ago•50 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
312•eljojo•19h ago•193 comments

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

https://github.com/sandys/kappal
4•sandGorgon•2d ago•2 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
363•aktau•23h ago•189 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
443•lstoll•23h ago•292 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
78•kmm•5d ago•11 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
98•quibono•4d ago•24 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
26•bikenaga•3d ago•14 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
282•i5heu•19h ago•232 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
48•gmays•12h ago•19 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1093•cdrnsf•1d ago•474 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
313•surprisetalk•3d ago•45 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
160•vmatsiiako•21h ago•73 comments
Open in hackernews

Show HN: OWhisper – Ollama for realtime speech-to-text

https://docs.hyprnote.com/owhisper/what-is-this
289•yujonglee•5mo ago
Hello everyone. This is Yujong from the Hyprnote team (https://github.com/fastrepl/hyprnote).

We built OWhisper for 2 reasons: (Also outlined in https://docs.hyprnote.com/owhisper/what-is-this)

(1). While working with on-device, realtime speech-to-text, we found there isn't tooling that exists to download / run the model in a practical way.

(2). Also, we got frequent requests to provide a way to plug in custom STT endpoints to the Hyprnote desktop app, just like doing it with OpenAI-compatible LLM endpoints.

The (2) part is still kind of WIP, but we spent some time writing docs so you'll get a good idea of what it will look like if you skim through them.

For (1) - You can try it now. (https://docs.hyprnote.com/owhisper/cli/get-started)

  bash
  brew tap fastrepl/hyprnote && brew install owhisper
  owhisper pull whisper-cpp-base-q8-en
  owhisper run whisper-cpp-base-q8-en

If you're tired of Whisper, we also support Moonshine :) Give it a shot (owhisper pull moonshine-onnx-base-q8)

We're here and looking forward to your comments!

Comments

yujonglee•5mo ago
Happy to answer any questions!

These are list of local models it supports:

- whisper-cpp-base-q8

- whisper-cpp-base-q8-en

- whisper-cpp-tiny-q8

- whisper-cpp-tiny-q8-en

- whisper-cpp-small-q8

- whisper-cpp-small-q8-en

- whisper-cpp-large-turbo-q8

- moonshine-onnx-tiny

- moonshine-onnx-tiny-q4

- moonshine-onnx-tiny-q8

- moonshine-onnx-base

- moonshine-onnx-base-q4

- moonshine-onnx-base-q8

phkahler•5mo ago
I thought whisper and others took large chunks (20-30 seconds) of speech, or a complete wave file as input. How do you get real-time transcription? What size chunks do you feed it?

To me, STT should take a continuous audio stream and output a continuous text stream.

yujonglee•5mo ago
I use VAD to chunk audio.

Whisper and Moonshine both works in a chunk, but for moonshine:

> Moonshine's compute requirements scale with the length of input audio. This means that shorter input audio is processed faster, unlike existing Whisper models that process everything as 30-second chunks. To give you an idea of the benefits: Moonshine processes 10-second audio segments 5x faster than Whisper while maintaining the same (or better!) WER.

Also for kyutai, we can input continuous audio in and get continuous text out.

- https://github.com/moonshine-ai/moonshine - https://docs.hyprnote.com/owhisper/configuration/providers/k...

mijoharas•5mo ago
Something like that, in a cli tool, that just gives text to stdout would be perfect for a lot of use cases for me!

(maybe with an `owhisper serve` somewhere else to start the model running or whatever.)

yujonglee•5mo ago
Are you thinking about the realtime use-case or batch use-case?

For just transcribing file/audio,

`owhisper run <MODEL> --file a.wav` or

`curl httpsL//something.com/audio.wav | owhisper run <MODEL>`

might makes sense.

mijoharas•5mo ago
agreed, both of those make sense, but I was thinking realtime. (pipes can stream data, I'd like and find useful something that can stream tts to stdout in realtime.)
yujonglee•5mo ago
It's open-source. Happy to review & merge if you can send us PR!

https://github.com/fastrepl/hyprnote/blob/8bc7a5eeae0fe58625...

ctbellmar•5mo ago
I wrote a tool that may be just the thing for you:

https://github.com/bikemazzell/skald-go/

Just speech to text, CLI only, and it can paste into whatever app you have open.

mijoharas•5mo ago
Oh, this does sound cool. Couple of questions that aren't clear from the readme (to me).

What exactly does the silence detection mean? does that mean it'll wait until a pause, and then send the audio off to whisper, and return the output (and stop the process)? Same question with continuous. Does that just mean it continues going until CTRL+C?

Nvm, answered my own question, looks like yes for both[0][1]. Cool this seems pretty great actually.

[0] https://github.com/bikemazzell/skald-go/blob/main/pkg/skald/...

[1] https://github.com/bikemazzell/skald-go/blob/main/pkg/skald/...

zveyaeyv3sfye•5mo ago
Having used whisper and noticed the useless quality due to their 30-second chunks, I would stay far away from software working on even a shorter duration.

The short duration effectively means that the transcription will start producing nonsense as soon as a sentence is cut up in the middle.

alkh•5mo ago
Sorry, maybe I missed it but I didn't see this list on your website. I think it is a good idea to add this info there. Besides that, thank you for the effort and your work! I will definetely give it a try
yujonglee•5mo ago
got it. fyi if you run `owhisper pull --help`, this info is printed
shekhar101•5mo ago
FYI: owhisper pull whisper-cpp-large-turbo-q8 Failed to download model.ggml: Other error: Server does not support range requests. Got status: 200 OK

But the base-q8 works (and works quite well!). The TUI is really nice. Speaker diarization would make it almost perfect for me. Thanks for building this.

yujonglee•5mo ago
we store data in R2 and range query sometime glitch... It might work if you retry it
JP_Watts•5mo ago
I’d like to use this to transcribe meeting minutes with multiple people. How could this program work for that use case?
yujonglee•5mo ago
If your use-case is meeting, https://github.com/fastrepl/hyprnote is for you. OWhisper is more like a headless version of it.
JP_Watts•5mo ago
Can you describe how it pick different voices? Does it need separate audio channels, or does it recognize different voices on the same audio input?
yujonglee•5mo ago
It separate mic/speaker as 2 channel. So you can reliably get "what you said" vs "what you heard".

For splitting speaker within channel, we need AI model to do that. It is not implemented yet, but I think we'll be in good shape somewhere in September.

Also we have transcript editor that you can easily split segment, assign speakers.

sxp•5mo ago
If you want to transcribe meeting notes, whisper isn't the best tool because it doesn't separate the transcribe by speakers. There are some other tools that do that, but I'm not sure what the best local option is. I've used Google's cloud STT with the diarization option and manually renamed "Speaker N" after the fact.
solarkraft•5mo ago
Wait, this is cool.

I just spent last week researching the options (especially for my M1!) and was left wishing for a standard, full-service (live) transcription server for Whisper like OLlama has been for LLMs.

I’m excited to try this out and see your API (there seems to be a standard vaccuum here due to openai not having a real time transcription service, which I find to be a bummer)!

Edit: They seem to emulate the Deepgram API (https://developers.deepgram.com/reference/speech-to-text-api...), which seems like a solid choice. I’d definitely like to see a standard emerging here.

yujonglee•5mo ago
Correct. About the deepgram-compatibility: https://docs.hyprnote.com/owhisper/deepgram-compatibility

Let me know how it goes!

solarkraft•5mo ago
I haven’t had the time to properly play around with it yet, but digging into the available meta-info reveals that ... there’s not a lot of it.

When I find the time to set it up I’d like to contribute to the documentation to answer the questions I had, but I couldn’t even find information on how to do that (no docs folder in the repo contribution.md, which the AI assistant also points me towards, doesn’t contain information about adding to the docs).

In general I find it a bit distracting that the OWhisper code is inside of the hyprnote repository. For discoverability and “real project” purposes I find that it would probably deserve its own.

clickety_clack•5mo ago
Please find a way to add speaker diarization, with a way to remember the speakers. You can do it with pyannote, and get a vector embedding of each speaker that can be compared between audio samples, but that’s a year old now so I’m sure there’s better options now!
yujonglee•5mo ago
yeah that is on the roadmap!
williamsss•5mo ago
I’ve done something similar recently, using speaker diarization to handle situations where two or more people share a laptop on a recorded call.

Ultimately, I chose a cloud-based GPU setup, as the highest-performing diarization models required a GPU to process properly. Happy to share more if you’re going that route.

clickety_clack•5mo ago
What model did you use for diarization?
mijoharas•5mo ago
Ok, cool! I was actually one of the people on the hyprnote HN thread asking for a headless mode!

I was actually integrating some whisper tools yesterday. I was wondering if there was a way to get a streaming response, and was thinking it'd be nice if you can.

I'm on linux, so don't think I can test out owhisper right now, but is that a thing that's possible?

Also, it looks like the `owhisper run` command gives it's output as a tui. Is there an option for a plain text response so that we can just pipe it to other programs? (maybe just `kill`/`CTRL+C` to stop the recording and finalize the words).

Same question for streaming, is there a way to get a streaming text output from owhisper? (it looks like you said you create a deepgram compatible api, I had a quick look at the api docs, but I don't know how easy it is to hook into it and get some nice streaming text while speaking).

Oh yeah, and diarisation (available with a flag?) would be awesome, one of the things that's missing from most of the easiest to run things I can find.

mijoharas•5mo ago
Oh wait, maybe you do support linux for owhisper: https://github.com/fastrepl/homebrew-hyprnote/blob/main/Form...

Can you help me out to find where the code you've built is? I can see the folder in github[0], but I can't see the code for the cli for instance? unless I'm blind.

[0] https://github.com/fastrepl/hyprnote/tree/main/owhisper

yujonglee•5mo ago
This is CLI entry point:

https://github.com/fastrepl/hyprnote/blob/8bc7a5eeae0fe58625...

yujonglee•5mo ago
> I'm on linux

I didn't tested on Linux yet, but we have linux build: http://owhisper.hyprnote.com/download/latest/linux-x86_64

> also, it looks like the `owhisper run` command gives it's output as a tui. Is there an option for a plain tex

`owhisper run` is more like way to quickly trying it out. But I think piping is definitely something that should work.

> Same question for streaming, is there a way to get a streaming text output from owhisper?

You can use Deepgram client to talk to `owhisper serve`. (https://docs.hyprnote.com/owhisper/deepgram-compatibility) So best resource might be Deepgram client SDK docs.

> diarisation

yeah on the roadmap

mijoharas•5mo ago
Nice stuff, had a quick test on linux and it works (built directly, I didn't check out the brew). I ran into a small issue with moonshine and opened an issue on github.

Great work on this! excited to keep an eye on things.

philjackson•5mo ago
Also had a quick play too. The TUI is garbled thanks to some stderr messages which can just be dev/null'd. I don't seem to be able to interact with the transcripts with the arrow or jk keys.

Overall though, it's fast and really impressive. Can't wait for it to progress.

DiabloD3•5mo ago
I suggest you don't brand this "Ollama for X". They've become a commercial operation that is trying to FOSS-wash their actions through using llama.cpp's code and then throwing their users under the bus when they can't support them.

I see that you are also using llama.cpp's code? That's cool, but make sure you become a member of that community, not an abuser.

yujonglee•5mo ago
yeah we use whisper.cpp for whisper inference. this is more like a community-focused project, not a commercial product!
reilly3000•5mo ago
Ya after spending a decent amount of time in r/localllama I was surprised that a project would want to name itself in association with Ollama, it’s got a pretty bad reputation in the community at this point.
wanderingmind•5mo ago
Thank you for taking the time to build something and share it. However what is the advantage of using this over whisper.cpp stream that can also do real time conversion?

https://github.com/ggml-org/whisper.cpp/tree/master/examples...

yujonglee•5mo ago
Its lot more than that.

- It supports other models like moonshine.

- It also works as proxy for cloud model providers.

- It can expose local models as Deepgram compatible api server

wanderingmind•5mo ago
Thank you. Having it to operate a proxy server that other apps can connect to is really useful.
elektor•5mo ago
Cool tool! Are you guys releasing Hyprnote for Windows this month?
yujonglee•5mo ago
probably end of this month or early next month. not 100% sure.
bartleeanderson•5mo ago
Very cool. I was reading through the various threads here. I am working on adding stt and tts to an AI DungeonMaster. Just a personal fun project, am working on the adventure part of it now. This will come in handy. I had dungeon navigation via commands working but started over and left it at the point where I am ready to merge the navigation back in again once I was happy with a slimmer version with one file. It will be fun to be able to talk to the DM and have it respond with voice and actions. The diarization will be very helpful if I can create a stream where it can hear all of us conversing at once. But baby steps. Still working on getting the whole campaign working after I get characters created and put in a party :)
fancy_pantser•5mo ago
I scratched a similar itch and found local LLMs plus Whisper worked really well to listen in and "DJ" a soundtrack while playing tabletop RPGs with a group. If you want to check it out: https://github.com/sean-public/conductor
rshemet•5mo ago
THIS IS THE BOMB!!! So excited for this one. Thanks for putting cool tech out there.
yujonglee•5mo ago
Thank you!
pylotlight•5mo ago
This has MPS support for hwa?
yujonglee•5mo ago
yes. metal is on
notthetup•5mo ago
Is there a way to list all the models that are available to be pulled?
yujonglee•5mo ago
sure. `owhisper pull --help`
neom•5mo ago
Just wanna give a shout out to the hyprnote team - I've been running it for about a month now and I love how simple and no gimmicks it is. It's a good app, def recommend! (Team seem like a lovely group of youngins' also) :)
replwoacause•5mo ago
I’m looking for something that is aware of what is being discussed realtime, so if I zone out for a few minutes, I can ask it what I missed or to clarify something. Can this do that? If not, anybody know of something that can?
koolala•5mo ago
Why not use a LLM with the speech to text output?
tinodb•5mo ago
Zoom does it quite ok
dcreater•5mo ago
Why cant it just use any OpenAI API endpoint?
yujonglee•5mo ago
what do you mean? this use-case is not llm. it is realtime stt.

also fyi - https://docs.hyprnote.com/owhisper/configuration/providers/o...

tempodox•5mo ago
It seems to use https://api.deepgram.com (and other web endpoints) and apparently needs an API key, so it's not actually local. Why is it being compared to ollama, which does run fully locally?
yujonglee•5mo ago
It can run Whisper and Moonshine models locally, while also allowing the use of other API providers. Read the docs - or at least this post.
tempodox•5mo ago
I would want such information accessible without having to go hunt for it. You could improve your presentation by interposing fewer clicks between a reader and the thing they want to know.
0x696C6961•5mo ago
The information is readily available in the open-your-eyes section.
zveyaeyv3sfye•5mo ago
> I would want such information accessible without having to go hunt for it.

Where exactly, if not in the FM?

theanonymousone•5mo ago
Given how sentiments towards ollama has become, I'm not sure this is a clever marketing line :D
jftuga•5mo ago
I have not heard about this. Can you please provide more context?
theanonymousone•5mo ago
I wasn't 100% serious, but this should give you some information: https://news.ycombinator.com/item?id=44867238
vinni2•5mo ago
Very neat project! Congratulations to the founders. I was wondering why there is no one working on such a tool.

But I was hoping couple of features would be supported: 1. Multilingual support. It seems like even if I use a multilingual model like whisper-cpp-large-turbo-q8, the application seems to assume I am speaking English. 2. Translate feature. Probably already supported but I didnt see the option.

net_rando•5mo ago
Is a container version in your roadmap?
milchek•5mo ago
For people on osx looking for a no fuss open source pure whisper based local transcription to any input field in the OS you should also try OpenSuperWhisper (can easily be installed with brew)
pmarreck•5mo ago
ah, i use superwhisper (it's great!), I didn't realize there was an open-source version
pmarreck•5mo ago
Does it do speaker diarization? That's the one thing that I wish Whisper did out of the box. (I know WhisperX exists, but I haven't had a chance to try it yet.)

EDIT: Ah, I see this was already answered.

robertherber•5mo ago
Looks really cool! Will give it a try
gafotech•5mo ago
Looks similar to speaches
jp1016•5mo ago
Really neat work! I’ve been experimenting with something similar running a local Whisper model for quick transcriptions, then organizing the notes in a tabbed interface so I can keep different topics separate without switching windows. Vertical tabs have been surprisingly nice for keeping ongoing transcription sessions alongside reference material (I use beavergrow.com for this, but anything with a good tab system would work).
yujonglee•5mo ago
kind of self-plug, but you might find https://github.com/fastrepl/hyprnote/blob/main/README.md interesting.

EDIT: typo

matcha-video•5mo ago
Question for folks who work a lot with STT models - What is your favorite model that supports word-level timestamps, has good dysfluency detection (whisper isn't great), and is also supported by transformers.js?
williamsss•5mo ago
A few months ago I had to work around a problem like this and the best out there was WhisperX. Not sure about transformer.js support.

Link to the repo - https://github.com/m-bain/whisperX

ktosobcy•5mo ago
LOL, and just only yesterday I was looking for a tool like that :D

Though, with a twist that it would transcribe it with IPA :)

lostmsu•5mo ago
If you use this with a cloud provider, please consider using https://borgcloud.org/speech-to-text ( https://borgcloud.org/api/v1/audio/transcriptions - Open-AI-compatible endpoint ). We do transcription at $0.06 per hour using Whisper Large v3 Turbo. Deepgram.com is $0.288 per hour.