frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Diarize – CPU-only speaker diarization, 7x faster than pyannote

https://github.com/FoxNoseTech/diarize
2•loookas•7h ago

Comments

loookas•7h ago
I built this because I needed speaker diarization for two things: a meeting summarization script (record → diarize → transcribe → feed to Claude for summaries), and a robotics project where I need real-time speaker identification.

I started with pyannote, which is the standard tool for this. It worked, but processing a single call took forever on CPU, and the fans on my MacBook sounded like a jet engine. So I decided to build something faster.

The pipeline: Silero VAD → WeSpeaker ResNet34 embeddings (ONNX Runtime) → GMM+BIC speaker count estimation → spectral clustering. All classical ML after the embedding step — no neural segmentation model like pyannote uses.

Results on VoxConverse (216 files, 1–20 speakers):

DER: ~10.8% (pyannote free models: ~11.2%) CPU speed: RTF 0.12 vs 0.86 (pyannote community-1) — about 7x faster 10-min recording: ~1.2 min vs ~8.6 min Speaker count: 87–97% within ±1 for 1–5 speakers

What it doesn't do well: 8+ speakers (count estimation breaks down), overlapping speech (single speaker per frame), and it's only been benchmarked on one dataset so far.

Usage: pip install diarize

from diarize import diarize result = diarize("meeting.wav")

No GPU, no API keys, no HuggingFace account. Apache 2.0. Happy to answer questions about the architecture, benchmarks, or tradeoffs.

guerython•7h ago
Nice to see Diarize lean into CPU-only inference for compliance workloads. We leaned on the same Silero -> embedding -> spectral stack and one stabilizer that helped was filtering Silero segments under ~350 ms and merging anything with cosine distance <0.25 before the GMM, so the clustering stopped flipping speakers on micro-pauses.

Another lever we added was keeping the last few call centroids and biasing the spectral solver toward the prototype that had >0.75 similarity, which keeps returning participants from spawning a new SPEAKER label every session. Are you thinking about exposing that kind of anchor_embeddings hook so teams can keep participant IDs consistent across calls?

loookas•7h ago
Good tips on the pre-clustering filtering- we do something similar with the 0.4s threshold on short segments, but the cosine distance merge before GMM is interesting, I'll look into that.

on the cross-session speaker consistency— yes, that's on the roadmap. The plan is to store speaker embeddings (256-dim vectors) in a vector DB and use them for matching during diarization.

Something like an anchor_embeddings parameter you can pass in, so the output labels stay consistent across calls.

Right now every call produces SPEAKER_00, SPEAKER_01 etc. independently. the embedding extraction already works well enough for matching (that's what cosine similarity on WeSpeaker embeddings is good at), the missing piece is the API surface and the matching logic on top of clustering.

What's your setup for storing/matching the centroids? Curious if you're doing it at inference time or as a post-processing step.

loookas•6h ago
One thing I found surprising during development: the speaker count estimation turned out to be the hardest part of the whole pipeline, not the embeddings or clustering.

Most diarization papers treat it as a solved problem or skip it entirely ("assume N speakers"). But in real meetings nobody tells you upfront how many people are on the call. GMM+BIC gets you to 51% exact match on VoxConverse, which sounds bad until you look at it per bucket — for 1–4 speakers it's 54–91% exact and 88–97% within ±1. It's 8+ speakers where it completely falls apart (0% exact match) .

Curious if anyone has found better approaches for automatic speaker count estimation that don't require a neural model.

Show HN: Explain Curl Commands

https://github.com/akgitrepos/explain-my-curl
28•akgitrepos•2d ago•0 comments

Show HN: Online OCR Free – Batch OCR UI for Tesseract, Gemini and OpenRouter

https://onlineocrfree.qzz.io
8•naimurhasanrwd•2h ago•2 comments

Show HN: Open-Source Article 12 Logging Infrastructure for the EU AI Act

32•systima•12h ago•2 comments

Show HN: Effective Git

https://github.com/nolasoft/okgit
22•nola-a•2d ago•2 comments

Show HN: TrAIn of Thought – AI chat as I want it to be

https://bix.computer/graphMode
2•two-sandwich•1h ago•0 comments

Show HN: Agent Action Protocol (AAP) – MCP got us started, but is insufficient

https://github.com/agentactionprotocol/aap/
8•hank2000•5h ago•1 comments

Show HN: A tool to give every local process a stable URL

https://github.com/logscore/roxy
3•lsreeder01•2h ago•0 comments

Show HN: We want to displace Notion with collaborative Markdown files

https://www.moment.dev/
12•antics•4h ago•3 comments

Show HN: I built a sub-500ms latency voice agent from scratch

https://www.ntik.me/posts/voice-agent
548•nicktikhonov•1d ago•152 comments

Show HN: Demucs music stem separator rewritten in Rust – runs in the browser

https://github.com/nikhilunni/demucs-rs
5•nikhilunni•6h ago•1 comments

Show HN: React-Kino – Cinematic scroll storytelling for React (1KB core)

https://github.com/btahir/react-kino
17•bilater•2d ago•1 comments

Show HN: OpenMandate – Declare what you need, get matched

https://openmandate.ai
2•raj-shekhar•4h ago•0 comments

Show HN: Apcher – Generate self-hosted Node.js workflows from prompts

https://apcher.dev
3•Samueedwards1•4h ago•8 comments

Show HN: Omni – Open-source workplace search and chat, built on Postgres

https://github.com/getomnico/omni
165•prvnsmpth•1d ago•41 comments

Show HN: Pianoterm – Run shell commands from your Piano. A Linux CLI tool

https://github.com/vustagc/pianoterm
57•vustagc•1d ago•21 comments

Show HN: Timber – Ollama for classical ML models, 336x faster than Python

https://github.com/kossisoroyce/timber
197•kossisoroyce•1d ago•33 comments

Show HN: AI tool that brutally roasts your AI agent ideas

https://whycantwehaveanagentforthis.com
4•Sattyamjjain•5h ago•1 comments

Show HN: uBlock filter list to blur all Instagram Reels

https://gist.github.com/shraiwi/009c652da6ce8c99a6e1e0c86fe66886
123•shraiwi•1d ago•48 comments

Show HN: Govbase – Follow a bill from source text to news bias to social posts

https://govbase.com
209•foxfoxx•1d ago•86 comments

Show HN: DejaShip – an intent ledger to stop AI agents from building duplicates

https://github.com/mingulov/dejaship
3•mdn0•5h ago•0 comments

Show HN: Sai – Your always-on co-worker

https://www.simular.ai/sai
2•pentamassiv•6h ago•2 comments

Show HN: Herniated disc made me build a back-safe kettlebell app

https://kbemom.com/
2•blacktarmac•6h ago•2 comments

Show HN: Web Audio Studio – A Visual Debugger for Web Audio API Graphs

https://webaudio.studio/
64•alexgriss•1d ago•7 comments

Show HN: Kai – macOS native fully autonomous AI agent.

https://www.hikai.space
3•StephaneBessa•7h ago•1 comments

Show HN: Visual Lambda Calculus – a thesis project (2008) revived for the web

https://github.com/bntre/visual-lambda
48•bntr•3d ago•9 comments

Show HN: Diarize – CPU-only speaker diarization, 7x faster than pyannote

https://github.com/FoxNoseTech/diarize
2•loookas•7h ago•4 comments

Show HN: PingMeBud – A macOS app that listens to meetings so you don't have to

https://www.pingmebud.com/
2•spaceman3•7h ago•0 comments

Show HN: LazyTail – Terminal log viewer with built-in MCP server for AI analysis

https://github.com/raaymax/lazytail
3•raaymax•8h ago•0 comments

Show HN: FixYou – AI tool that tells you which cancer screenings you need

https://www.fixyou.app/
2•forrestzhong•8h ago•0 comments

Show HN: Qast – Cast anything (files, URLs, screen) to any TV from the CLI

https://github.com/richlegrand/qast
4•narragansett•8h ago•1 comments