frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Diarize – CPU-only speaker diarization, 7x faster than pyannote

https://github.com/FoxNoseTech/diarize
2•loookas•2h ago

Comments

loookas•2h ago
I built this because I needed speaker diarization for two things: a meeting summarization script (record → diarize → transcribe → feed to Claude for summaries), and a robotics project where I need real-time speaker identification.

I started with pyannote, which is the standard tool for this. It worked, but processing a single call took forever on CPU, and the fans on my MacBook sounded like a jet engine. So I decided to build something faster.

The pipeline: Silero VAD → WeSpeaker ResNet34 embeddings (ONNX Runtime) → GMM+BIC speaker count estimation → spectral clustering. All classical ML after the embedding step — no neural segmentation model like pyannote uses.

Results on VoxConverse (216 files, 1–20 speakers):

DER: ~10.8% (pyannote free models: ~11.2%) CPU speed: RTF 0.12 vs 0.86 (pyannote community-1) — about 7x faster 10-min recording: ~1.2 min vs ~8.6 min Speaker count: 87–97% within ±1 for 1–5 speakers

What it doesn't do well: 8+ speakers (count estimation breaks down), overlapping speech (single speaker per frame), and it's only been benchmarked on one dataset so far.

Usage: pip install diarize

from diarize import diarize result = diarize("meeting.wav")

No GPU, no API keys, no HuggingFace account. Apache 2.0. Happy to answer questions about the architecture, benchmarks, or tradeoffs.

guerython•2h ago
Nice to see Diarize lean into CPU-only inference for compliance workloads. We leaned on the same Silero -> embedding -> spectral stack and one stabilizer that helped was filtering Silero segments under ~350 ms and merging anything with cosine distance <0.25 before the GMM, so the clustering stopped flipping speakers on micro-pauses.

Another lever we added was keeping the last few call centroids and biasing the spectral solver toward the prototype that had >0.75 similarity, which keeps returning participants from spawning a new SPEAKER label every session. Are you thinking about exposing that kind of anchor_embeddings hook so teams can keep participant IDs consistent across calls?

loookas•2h ago
Good tips on the pre-clustering filtering- we do something similar with the 0.4s threshold on short segments, but the cosine distance merge before GMM is interesting, I'll look into that.

on the cross-session speaker consistency— yes, that's on the roadmap. The plan is to store speaker embeddings (256-dim vectors) in a vector DB and use them for matching during diarization.

Something like an anchor_embeddings parameter you can pass in, so the output labels stay consistent across calls.

Right now every call produces SPEAKER_00, SPEAKER_01 etc. independently. the embedding extraction already works well enough for matching (that's what cosine similarity on WeSpeaker embeddings is good at), the missing piece is the API surface and the matching logic on top of clustering.

What's your setup for storing/matching the centroids? Curious if you're doing it at inference time or as a post-processing step.

loookas•58m ago
One thing I found surprising during development: the speaker count estimation turned out to be the hardest part of the whole pipeline, not the embeddings or clustering.

Most diarization papers treat it as a solved problem or skip it entirely ("assume N speakers"). But in real meetings nobody tells you upfront how many people are on the call. GMM+BIC gets you to 51% exact match on VoxConverse, which sounds bad until you look at it per bucket — for 1–4 speakers it's 54–91% exact and 88–97% within ±1. It's 8+ speakers where it completely falls apart (0% exact match) .

Curious if anyone has found better approaches for automatic speaker count estimation that don't require a neural model.

The Attention Tax

https://www.afox.dev/posts/the-attention-tax
1•wtfox•33s ago•0 comments

Attacks on GPS Spike Amid US and Israeli War on Iran

https://www.wired.com/story/gps-attacks-on-ships-spike-amid-the-us-and-israeli-war-on-iran/
1•speckx•1m ago•0 comments

Show HN: The Nova: Evolution for Evolution's Sake

https://fuchsia-broad-flamingo-157.mypinata.cloud/ipfs/bafybeihc6mom4oowr6afzofxi7gzpnrsi3smaruur...
1•Novaga•1m ago•0 comments

Justice Department Seeks to Reverse Course and Defend Law Firm Sanctions

https://www.wsj.com/us-news/law/justice-department-seeks-to-reverse-course-and-defend-law-firm-sa...
1•JumpCrisscross•2m ago•0 comments

Why MAGA suddenly loves solar power

https://www.washingtonpost.com/business/2026/03/02/katie-miller-solar-power-trump/
1•standeven•4m ago•0 comments

Show HN: RUOK – Self-hosted personal OKR system with AI-powered analytics

https://github.com/zli117/RUOK/
1•lzl1234•4m ago•0 comments

Show HN: VibePod CLI – Run AI agents with isolation and better observability

https://vibepod.dev/
1•nezhar•5m ago•0 comments

Block's Jack Dorsey thinks AI can do 40% of his job

https://www.theguardian.com/technology/2026/mar/03/jack-dorsey-block-ai-worker-jobs
2•skor•7m ago•0 comments

Show HN: A runtime authorization layer for AI agents

2•rkka•8m ago•0 comments

Bash Is Not Enough: Why Large-Scale CI Needs an Orchestrator

https://www.iankduncan.com/engineering/2026-02-06-bash-is-not-enough/
2•PaulHoule•10m ago•0 comments

Why Your Company's Digital Sovereignty Is a House of Cards

https://medium.com/@gastonbehar/why-your-companys-digital-sovereignty-is-a-house-of-cards-556b31c...
2•gastonbehar•10m ago•1 comments

Why Test Environments Fail and What Top Teams Do to Avoid the Chaos

https://sdtimes.com/test/why-test-environments-fail-and-what-top-teams-do-to-avoid-the-chaos/
2•mikece•11m ago•0 comments

Easterlin Paradox

https://en.wikipedia.org/wiki/Easterlin_paradox
2•gessha•11m ago•0 comments

Claude is an Electron App because we've lost native

https://tonsky.me/blog/fall-of-native/
2•todsacerdoti•11m ago•0 comments

Show HN: OculOS – Any desktop app as a JSON API via OS accessibility tree

https://github.com/huseyinstif/oculos
2•stif1337•12m ago•0 comments

Nvidia-backed Ayar Labs raises $500M at $3.75B valuation

https://www.reuters.com/technology/nvidia-backed-ayar-labs-raises-500-million-375-billion-valuati...
3•abe94•12m ago•0 comments

Show HN: Xenith.ai – Web Assembly Based Voice Assistant with WebLLM/Whisper/VITS

https://xenith.ai
2•cppshane•12m ago•0 comments

Show HN: Embed a dedication page directly inside DRM-free EPUBs

https://ebookfrom.me/
2•dnmellen•12m ago•0 comments

Activist investor Elliott boosts stake in Pinterest by $1B

https://www.reuters.com/sustainability/sustainable-finance-reporting/activist-investor-elliott-ta...
1•abe94•13m ago•0 comments

Accenture acquires Downdetector as part of $1.2B deal

https://www.theregister.com/2026/03/03/accenture_buys_ookla_downdetector_ziff_davis/
1•mikece•13m ago•0 comments

Pour Your Soul In – An Ode to Engineering

https://www.tyleo.com/blog/pour-your-soul-in
1•tyleo•14m ago•1 comments

Smee.io: Webhook payload delivery service

https://github.com/probot/smee.io
1•Lwrless•14m ago•0 comments

Google Chrome is switching to a two-week release cycle

https://9to5google.com/2026/03/03/chrome-two-week-updates/
2•mikece•14m ago•0 comments

'MacBook Neo' briefly appears on Apple's regulatory website

https://appleinsider.com/articles/26/03/03/apple-itself-has-leaked-a-macbook-neo
2•alwillis•15m ago•0 comments

EFF to Supreme Court: Shut Down Unconstitutional Geofence Searches

https://www.eff.org/press/releases/eff-supreme-court-shut-down-unconstitutional-geofence-searches
1•hn_acker•18m ago•0 comments

Using Nix with Dockerfiles (2023)

https://mitchellh.com/writing/nix-with-dockerfiles
1•apitman•18m ago•0 comments

Show HN: NoteCat – Record, Transcribe, Summarize Discord Voice Channels

https://notecat.fyi
2•h4rris•19m ago•0 comments

Everything Al Does

https://yousefamar.com/memo/log/2026-03-01-12-36-55/
1•speckx•19m ago•0 comments

A.I. Policy

https://chriskirknielsen.com/ai-policy/
1•speckx•20m ago•0 comments

Show HN: Mcptube – Turn YouTube videos into AI-queryable MCP servers

https://github.com/0xchamin/mcptube
1•belai•20m ago•0 comments