frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Qwen3-Omni: Native Omni AI model for text, image and video

https://github.com/QwenLM/Qwen3-Omni
148•meetpateltech•3h ago•34 comments

Fine-grained HTTP filtering for Claude Code

https://ammar.io/blog/httpjail
30•ammario•1h ago•3 comments

Choose Your Own Adventure

https://www.filfre.net/2025/09/choose-your-own-adventure/
56•naves•2h ago•29 comments

A board member's perspective of the RubyGems controversy

https://apiguy.substack.com/p/a-board-members-perspective-of-the
41•Qwuke•1d ago•66 comments

OpenAI and Nvidia announce partnership to deploy 10GW of Nvidia systems

https://openai.com/index/openai-nvidia-systems-partnership/
315•meetpateltech•4h ago•425 comments

Cap'n Web: a new RPC system for browsers and web servers

https://blog.cloudflare.com/capnweb-javascript-rpc-library/
246•jgrahamc•7h ago•111 comments

Categorical Foundations for Cute Layouts

https://research.colfax-intl.com/categorical-foundations-for-cute-layouts/
13•charles_irl•15h ago•3 comments

Why haven't local-first apps become popular?

https://marcobambini.substack.com/p/why-local-first-apps-havent-become
170•marcobambini•7h ago•219 comments

SWE-Bench Pro

https://github.com/scaleapi/SWE-bench_Pro-os
70•tosh•4h ago•14 comments

Diffusion Beats Autoregressive in Data-Constrained Settings

https://blog.ml.cmu.edu/2025/09/22/diffusion-beats-autoregressive-in-data-constrained-settings/
23•djoldman•2h ago•2 comments

Is a movie prop the ultimate laptop bag?

https://blog.jgc.org/2025/09/is-movie-prop-ultimate-laptop-bag.html
89•jgrahamc•8h ago•86 comments

PlanetScale for Postgres is now GA

https://planetscale.com/blog/planetscale-for-postgres-is-generally-available
226•munns•5h ago•130 comments

I Was a Weird Kid: Jailhouse Confessions of a Teen Hacker

https://www.bloomberg.com/news/features/2025-09-19/multimillion-dollar-hacking-spree-scattered-sp...
19•wslh•3d ago•1 comments

Mentra (YC W25) Is Hiring to build smart glasses

1•caydenpiercehax•3h ago

Umberto Eco: Ur-Fascism

https://bobmschwartz.com/2017/12/28/umberto-eco-ur-fascism/
22•saubeidl•27m ago•1 comments

Testing is better than data structures and algorithms

https://nedbatchelder.com/blog/202509/testing_is_better_than_dsa.html
52•rsyring•4h ago•35 comments

AI-generated “workslop” is destroying productivity?

https://hbr.org/2025/09/ai-generated-workslop-is-destroying-productivity
109•McScrooge•2h ago•48 comments

Transforming recursion into iteration for LLVM loop optimizations

https://dspace.mit.edu/handle/1721.1/162684
10•matt_d•1d ago•1 comments

I'm spoiled by Apple Silicon but still love Framework

https://simonhartcher.com/posts/2025-09-22-why-im-spoiled-by-apple-silicon-but-still-love-framework/
78•deevus•7h ago•125 comments

Unweaving warp specialization on modern tensor core GPUs

https://rohany.github.io/blog/warp-specialization/
14•rohany•59m ago•1 comments

Cloudflare is sponsoring Ladybird and Omarchy

https://blog.cloudflare.com/supporting-the-future-of-the-open-web/
511•jgrahamc•7h ago•333 comments

What happens when coding agents stop feeling like dialup?

https://martinalderson.com/posts/what-happens-when-coding-agents-stop-feeling-like-dialup/
55•martinald•1d ago•61 comments

Easy Forth (2015)

https://skilldrick.github.io/easyforth/
162•pkilgore•9h ago•94 comments

The Beginner's Textbook for Fully Homomorphic Encryption

https://arxiv.org/abs/2503.05136
144•Qision•1d ago•26 comments

CompileBench: Can AI Compile 22-year-old Code?

https://quesma.com/blog/introducing-compilebench/
109•jakozaur•7h ago•43 comments

Beyond the Front Page: A Personal Guide to Hacker News

https://hsu.cy/2025/09/how-to-read-hn/
178•firexcy•11h ago•75 comments

What is algebraic about algebraic effects?

https://interjectedfuture.com/what-is-algebraic-about-algebraic-effects/
65•iamwil•6h ago•28 comments

Human-Oriented Markup Language

https://huml.io/
44•vishnukvmd•5h ago•59 comments

A simple way to measure knots has come unraveled

https://www.quantamagazine.org/a-simple-way-to-measure-knots-has-come-unraveled-20250922/
92•baruchel•6h ago•45 comments

The Collapse of the Tjörn Bridge, Sweden, 1980

https://www.legalscandal.info/ls_eng/tjorn_bridge_disaster.html
6•ZeljkoS•3d ago•6 comments
Open in hackernews

Show HN: Python Audio Transcription: Convert Speech to Text Locally

https://www.pavlinbg.com/posts/python-speech-to-text-guide
11•Pavlinbg•2h ago

Comments

drewbuschhorn•1h ago
You should throw in some diarization, there's some pretty effective libraries that don't need pertraining on the voice separation in python.
Pavlinbg•1h ago
Nice suggestion, I'll look them up.
nvdnadj92•24m ago
I would suggest 2 speaker-diarization libraries:

- https://huggingface.co/pyannote/speaker-diarization-3.1 - https://github.com/narcotic-sh/senko

I personally love senko since it can run in seconds, whereas py-annote took hours, but there is a 10% WER (word error rate) that is tough to get around.

oidar•1h ago
What's the best solution right now for TTS that supports speaker diarisation?
makaimc•1h ago
AssemblyAI (YC S17) is currently the one that stands out in the WER and accuracy benchmarks (https://www.assemblyai.com/benchmarks). Though its models are accessed through a web API rather than locally hosted, and speaker diarization is enabled through a parameter in the API call (https://www.assemblyai.com/docs/speech-to-text/pre-recorded-...).
xnx•1h ago
I like this version of Whisper which has diarization built in: https://github.com/Purfview/whisper-standalone-win
999900000999•1h ago
Fantastic project.

I have an old project that relies on AWS transcription and I'd love to migrate it to something local.

vunderba•1h ago
Nice job. I made a similar python script available as a Github gist [1] a while back that given an audio file does the following:

- Converts to 16kHz WAV

- Transcribes using native ggerganov whisper

- Calls out to a local LLM to clean the text

- Prints out the final cleaned up transcription

I found that accuracy/success increased significantly when I added the LLM post-processor even with modestly sized 12-14b models.

I've been using it with great success to convert very old dictated memos from over a decade ago despite a lot of background noise (wind, traffic, etc).

[1] https://gist.github.com/scpedicini/455409fe7656d3cca8959c123...

xnx•1h ago
This tool requires ffmpeg, but don't forget that the latest version of ffmpeg has speech-to-text built in!

I'm sure there are use cases where using Whisper directly is better, but it's a great addition to an already versatile tool.

nvdnadj92•54m ago
I'm working on the same project myself and was planning to write a blog post similar to the author's. However, I'll share some additional tips and tricks that really made a difference for me.

For preprocessing, I found it best to convert files to a 16kHz WAV format for optimal processing. I also add low-pass and high-pass filters to remove non-speech sounds. To avoid hallucinations, I run Silero VAD on the entire audio file to find timestamps where there's a speaker. A side note on this: Silero requires careful tuning to prevent audio segments from being chopped up and clipped. I also use a post-processing step to merge adjacent VAD chunks, which helps ensure cohesive Whisper recordings.

For the Whisper task, I run Whisper in small audio chunks that correspond to the VAD timestamps. Otherwise, it will hallucinate during silences and regurgitate the passed-in prompt. If you're on a Mac, use the whisper-mlx models from Hugging Face to speed up transcription. I ran a performance benchmark, and it made a 22x difference to use a model designed for the Apple Neural Engine.

For post-processing, I've found that running the generated SRT files through ChatGPT to identify and remove hallucination chunks has a better yield.