frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Metaphor+Metonymy: "To love that well which thou must leave ere long"(Sonnet73)

https://www.huckgutman.com/blog-1/shakespeare-sonnet-73
1•gsf_emergency_6•1m ago•0 comments

Show HN: Django N+1 Queries Checker

https://github.com/richardhapb/django-check
1•richardhapb•17m ago•1 comments

Emacs-tramp-RPC: High-performance TRAMP back end using JSON-RPC instead of shell

https://github.com/ArthurHeymans/emacs-tramp-rpc
1•todsacerdoti•21m ago•0 comments

Protocol Validation with Affine MPST in Rust

https://hibanaworks.dev
1•o8vm•26m ago•1 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
2•gmays•27m ago•0 comments

Show HN: Zest – A hands-on simulator for Staff+ system design scenarios

https://staff-engineering-simulator-880284904082.us-west1.run.app/
1•chanip0114•28m ago•1 comments

Show HN: DeSync – Decentralized Economic Realm with Blockchain-Based Governance

https://github.com/MelzLabs/DeSync
1•0xUnavailable•33m ago•0 comments

Automatic Programming Returns

https://cyber-omelette.com/posts/the-abstraction-rises.html
1•benrules2•36m ago•1 comments

Why Are There Still So Many Jobs? The History and Future of Workplace Automation [pdf]

https://economics.mit.edu/sites/default/files/inline-files/Why%20Are%20there%20Still%20So%20Many%...
2•oidar•38m ago•0 comments

The Search Engine Map

https://www.searchenginemap.com
1•cratermoon•45m ago•0 comments

Show HN: Souls.directory – SOUL.md templates for AI agent personalities

https://souls.directory
1•thedaviddias•47m ago•0 comments

Real-Time ETL for Enterprise-Grade Data Integration

https://tabsdata.com
1•teleforce•50m ago•0 comments

Economics Puzzle Leads to a New Understanding of a Fundamental Law of Physics

https://www.caltech.edu/about/news/economics-puzzle-leads-to-a-new-understanding-of-a-fundamental...
2•geox•51m ago•0 comments

Switzerland's Extraordinary Medieval Library

https://www.bbc.com/travel/article/20260202-inside-switzerlands-extraordinary-medieval-library
2•bookmtn•51m ago•0 comments

A new comet was just discovered. Will it be visible in broad daylight?

https://phys.org/news/2026-02-comet-visible-broad-daylight.html
3•bookmtn•56m ago•0 comments

ESR: Comes the news that Anthropic has vibecoded a C compiler

https://twitter.com/esrtweet/status/2019562859978539342
2•tjr•58m ago•0 comments

Frisco residents divided over H-1B visas, 'Indian takeover' at council meeting

https://www.dallasnews.com/news/politics/2026/02/04/frisco-residents-divided-over-h-1b-visas-indi...
3•alephnerd•58m ago•3 comments

If CNN Covered Star Wars

https://www.youtube.com/watch?v=vArJg_SU4Lc
1•keepamovin•1h ago•1 comments

Show HN: I built the first tool to configure VPSs without commands

https://the-ultimate-tool-for-configuring-vps.wiar8.com/
2•Wiar8•1h ago•3 comments

AI agents from 4 labs predicting the Super Bowl via prediction market

https://agoramarket.ai/
1•kevinswint•1h ago•1 comments

EU bans infinite scroll and autoplay in TikTok case

https://twitter.com/HennaVirkkunen/status/2019730270279356658
6•miohtama•1h ago•5 comments

Benchmarking how well LLMs can play FizzBuzz

https://huggingface.co/spaces/venkatasg/fizzbuzz-bench
1•_venkatasg•1h ago•1 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
19•SerCe•1h ago•14 comments

Octave GTM MCP Server

https://docs.octavehq.com/mcp/overview
1•connor11528•1h ago•0 comments

Show HN: Portview what's on your ports (diagnostic-first, single binary, Linux)

https://github.com/Mapika/portview
3•Mapika•1h ago•0 comments

Voyager CEO says space data center cooling problem still needs to be solved

https://www.cnbc.com/2026/02/05/amazon-amzn-q4-earnings-report-2025.html
1•belter•1h ago•0 comments

Boilerplate Tax – Ranking popular programming languages by density

https://boyter.org/posts/boilerplate-tax-ranking-popular-languages-by-density/
1•nnx•1h ago•0 comments

Zen: A Browser You Can Love

https://joeblu.com/blog/2026_02_zen-a-browser-you-can-love/
1•joeblubaugh•1h ago•0 comments

My GPT-5.3-Codex Review: Full Autonomy Has Arrived

https://shumer.dev/gpt53-codex-review
2•gfortaine•1h ago•0 comments

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

https://github.com/AGDNoob/FastLog
3•AGDNoob•1h ago•1 comments
Open in hackernews

Show HN: Python Audio Transcription: Convert Speech to Text Locally

https://www.pavlinbg.com/posts/python-speech-to-text-guide
110•Pavlinbg•4mo ago

Comments

drewbuschhorn•4mo ago
You should throw in some diarization, there's some pretty effective libraries that don't need pertraining on the voice separation in python.
Pavlinbg•4mo ago
Nice suggestion, I'll look them up.
nvdnadj92•4mo ago
I would suggest 2 speaker-diarization libraries:

- https://huggingface.co/pyannote/speaker-diarization-3.1 - https://github.com/narcotic-sh/senko

I personally love senko since it can run in seconds, whereas py-annote took hours, but there is a 10% WER (word error rate) that is tough to get around.

oidar•4mo ago
What's the best solution right now for TTS that supports speaker diarisation?
makaimc•4mo ago
AssemblyAI (YC S17) is currently the one that stands out in the WER and accuracy benchmarks (https://www.assemblyai.com/benchmarks). Though its models are accessed through a web API rather than locally hosted, and speaker diarization is enabled through a parameter in the API call (https://www.assemblyai.com/docs/speech-to-text/pre-recorded-...).
xnx•4mo ago
I like this version of Whisper which has diarization built in: https://github.com/Purfview/whisper-standalone-win
999900000999•4mo ago
Fantastic project.

I have an old project that relies on AWS transcription and I'd love to migrate it to something local.

vunderba•4mo ago
Nice job. I made a similar python script available as a Github gist [1] a while back that given an audio file does the following:

- Converts to 16kHz WAV

- Transcribes using native ggerganov whisper

- Calls out to a local LLM to clean the text

- Prints out the final cleaned up transcription

I found that accuracy/success increased significantly when I added the LLM post-processor even with modestly sized 12-14b models.

I've been using it with great success to convert very old dictated memos from over a decade ago despite a lot of background noise (wind, traffic, etc).

[1] https://gist.github.com/scpedicini/455409fe7656d3cca8959c123...

xnx•4mo ago
This tool requires ffmpeg, but don't forget that the latest version of ffmpeg has speech-to-text built in!

I'm sure there are use cases where using Whisper directly is better, but it's a great addition to an already versatile tool.

hoherd•4mo ago
I was going to go the opposite way and suggest that if you want python audio transcription, you can skip ffmpeg and just use whisper directly. Using the whisper module directly gives you a variety of outputs, including text and srt.
xnx•4mo ago
Yep. Whisper is great. I use it on podcasts as part of removing ads. Last time I used one of the official versions it would only accept .wav files so I had to convert with ffmpeg first.
nvdnadj92•4mo ago
I'm working on the same project myself and was planning to write a blog post similar to the author's. However, I'll share some additional tips and tricks that really made a difference for me.

For preprocessing, I found it best to convert files to a 16kHz WAV format for optimal processing. I also add low-pass and high-pass filters to remove non-speech sounds. To avoid hallucinations, I run Silero VAD on the entire audio file to find timestamps where there's a speaker. A side note on this: Silero requires careful tuning to prevent audio segments from being chopped up and clipped. I also use a post-processing step to merge adjacent VAD chunks, which helps ensure cohesive Whisper recordings.

For the Whisper task, I run Whisper in small audio chunks that correspond to the VAD timestamps. Otherwise, it will hallucinate during silences and regurgitate the passed-in prompt. If you're on a Mac, use the whisper-mlx models from Hugging Face to speed up transcription. I ran a performance benchmark, and it made a 22x difference to use a model designed for the Apple Neural Engine.

For post-processing, I've found that running the generated SRT files through ChatGPT to identify and remove hallucination chunks has a better yield.

adzm•4mo ago
I added EQ to a task after reading this and got much more accurate and consistent results using whisper, thanks for the obvious in retrospect tip.
bnmoch3•4mo ago
Please can you share the prompt you use in ChatGPT to remove hallucination chunks
eevmanu•4mo ago
If I understood correctly, VAD has superior results than using ffmpeg silencedetect + silentremove, right?

I think latest version of ffmpeg could use whisper with VAD[1], but I still need to explore how with a simple PoC script

I'd love to know more about the post-processing prompt, my guess is that looks like an improved version of `semantic correction` prompt[2], but I may be wrong ¯\_(ツ)_/¯ .

[1] https://ffmpeg.org/ffmpeg-filters.html#toc-whisper-1

[2] https://gist.github.com/eevmanu/0de2d449144e9cd40a563170b459...

theologic•4mo ago
I always thought this was a great implementation if you have a Cuda layer: https://github.com/rgcodeai/Kit-Whisperx

I had an old Acer laptop hanging around, so I implemented this: https://github.com/Sanborn-Young/MP3ToTXT

I forget all the details of my tweaks, but I remember that I had better throughput on my version.

I know the OP talked about wanting it local, but thomasmol/whisper-diarization on replicate is fast and cheap. Here's a hacked front end to parse teh JSON: https://github.com/Sanborn-Young/MP3_2transcript

canadiantim•4mo ago
All I can say is you’re a legend. This is a great resource, thank you!
ancaster•4mo ago
whisperx does this all quite well and can be run with `uvx whisperx`

https://github.com/m-bain/whisperX

ChiliPie•4mo ago
That's a neat lil python script, it deserves a github page :)
primaprashant•4mo ago
btw, if you want local dictation, speak and get a transcript, not transcribe files, I built a Python tool called hns [1]. It's open source, uses faster-whisper, and you can run it with `uvx hns` or just `hns` after `uv tool install hns`.

[1]: https://github.com/primaprashant/hns

bharatkalluri•4mo ago
Since the past two days I've been working on SpeechShift [1], its a fully local, offline first, speech to text utility that allows you to trigger it with a command, transcribes with whisper and puts pastes it in the window you are currently focused on (like chrome, typora or some other window). Basically SuperWhisper [2] but for linux. (If this is something which interests you & check it out! Feel free to ping me if something does not work as expected.)

I've been trying to squeeze out performance out of whisper, but felt (at least for non native speakers) the base model does a good job. In terms of pre processing I do VAD & some normalization. But on my rusty thinkpad the processing time is way too long. I'll try some of the forementioned tips and see if the accuracy & perf can get any better. Post which I'm planning to use a SLM for text cleanup & post processing of the transcription. I'm documenting my learnings over at my notes [3].

[1] https://github.com/BharatKalluri/speechshift

[2] https://superwhisper.com/

[3] https://notes.bharatkalluri.com/speechshift-notes-during-dev...

abdullahkhalids•4mo ago
Do you have any metrics for performance?

Have you tried with languages other than English?

selim-now•4mo ago
where you considering fine-tuning the SLM as well?
xjlin0•4mo ago
Which local speach-to-text tool can use Apple chip's MLX?
a_c•4mo ago
I was using the same setup to try to transcribe a sound track of a video. A 60s aac audio took me maybe 10 minutes. I'm on a apple M4 and ran `whisper audio.aac --model medium --fp16 False --language Japanese`. Wonder if I'm doing something wrong
oulipo2•4mo ago
Cool! For osX there's also the nice opensource VoiceInk
aanet•4mo ago
Judging by the comments, it looks like this is application / use-case is the To-Do app of this age: everybody has their own implementation.

Not judging at all. In fact, the opposite. Thanks for sharing this, it's super valuable.

I think I'll learn from various sources here, and be implementing my own local-first transcription.

:thanks.gif:

keepamovin•4mo ago
I also have an app that does this fully locally and offline on the macOS app store; Wisprnote - using the openai whispr models. Works good.

What people are talking about, avoiding hallucinations through VAD based chunking, etc, are all things I pioneered with Wisprnote, which has been on the App Store for 2 years. Hasn't been updated recently - backlog of other work - but still works just as fine. Paid app. But good quality.

https://apps.apple.com/us/app/wisprnote/id1671480366?l=en-GB...

crangos•4mo ago
There's a GUI on top of whisper that is very handy for editing, as you can listen to the sentences: https://github.com/kaixxx/noScribe