frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: FamilyMemories.video – Turn static old photos into 5s AI videos

https://familymemories.video
1•tareq_•41s ago•0 comments

How Meta Made Linux a Planet-Scale Load Balancer

https://softwarefrontier.substack.com/p/how-meta-turned-the-linux-kernel
1•CortexFlow•43s ago•0 comments

A Turing Test for AI Coding

https://t-cadet.github.io/programming-wisdom/#2026-02-06-a-turing-test-for-ai-coding
1•phi-system•54s ago•0 comments

How to Identify and Eliminate Unused AWS Resources

https://medium.com/@vkelk/how-to-identify-and-eliminate-unused-aws-resources-b0e2040b4de8
1•vkelk•1m ago•0 comments

A2CDVI – HDMI output from from the Apple IIc's digital video output connector

https://github.com/MrTechGadget/A2C_DVI_SMD
1•mmoogle•2m ago•0 comments

CLI for Common Playwright Actions

https://github.com/microsoft/playwright-cli
1•saikatsg•3m ago•0 comments

Would you use an e-commerce platform that shares transaction fees with users?

https://moondala.one/
1•HamoodBahzar•4m ago•1 comments

Show HN: SafeClaw – a way to manage multiple Claude Code instances in containers

https://github.com/ykdojo/safeclaw
2•ykdojo•8m ago•0 comments

The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+

https://huggingface.co/blog/huggingface/one-year-since-the-deepseek-moment-blog-3
3•gmays•8m ago•0 comments

The Evolution of the Interface

https://www.asktog.com/columns/038MacUITrends.html
2•dhruv3006•10m ago•0 comments

Azure: Virtual network routing appliance overview

https://learn.microsoft.com/en-us/azure/virtual-network/virtual-network-routing-appliance-overview
2•mariuz•10m ago•0 comments

Seedance2 – multi-shot AI video generation

https://www.genstory.app/story-template/seedance2-ai-story-generator
2•RyanMu•13m ago•1 comments

Πfs – The Data-Free Filesystem

https://github.com/philipl/pifs
2•ravenical•17m ago•0 comments

Go-busybox: A sandboxable port of busybox for AI agents

https://github.com/rcarmo/go-busybox
3•rcarmo•18m ago•0 comments

Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery [pdf]

https://research.nvidia.com/labs/nemotron/files/NVFP4-QAD-Report.pdf
2•gmays•18m ago•0 comments

xAI Merger Poses Bigger Threat to OpenAI, Anthropic

https://www.bloomberg.com/news/newsletters/2026-02-03/musk-s-xai-merger-poses-bigger-threat-to-op...
2•andsoitis•19m ago•0 comments

Atlas Airborne (Boston Dynamics and RAI Institute) [video]

https://www.youtube.com/watch?v=UNorxwlZlFk
2•lysace•20m ago•0 comments

Zen Tools

http://postmake.io/zen-list
2•Malfunction92•22m ago•0 comments

Is the Detachment in the Room? – Agents, Cruelty, and Empathy

https://hailey.at/posts/3mear2n7v3k2r
2•carnevalem•22m ago•1 comments

The purpose of Continuous Integration is to fail

https://blog.nix-ci.com/post/2026-02-05_the-purpose-of-ci-is-to-fail
1•zdw•24m ago•0 comments

Apfelstrudel: Live coding music environment with AI agent chat

https://github.com/rcarmo/apfelstrudel
2•rcarmo•25m ago•0 comments

What Is Stoicism?

https://stoacentral.com/guides/what-is-stoicism
3•0xmattf•26m ago•0 comments

What happens when a neighborhood is built around a farm

https://grist.org/cities/what-happens-when-a-neighborhood-is-built-around-a-farm/
1•Brajeshwar•26m ago•0 comments

Every major galaxy is speeding away from the Milky Way, except one

https://www.livescience.com/space/cosmology/every-major-galaxy-is-speeding-away-from-the-milky-wa...
3•Brajeshwar•26m ago•0 comments

Extreme Inequality Presages the Revolt Against It

https://www.noemamag.com/extreme-inequality-presages-the-revolt-against-it/
2•Brajeshwar•26m ago•0 comments

There's no such thing as "tech" (Ten years later)

1•dtjb•27m ago•0 comments

What Really Killed Flash Player: A Six-Year Campaign of Deliberate Platform Work

https://medium.com/@aglaforge/what-really-killed-flash-player-a-six-year-campaign-of-deliberate-p...
1•jbegley•28m ago•0 comments

Ask HN: Anyone orchestrating multiple AI coding agents in parallel?

1•buildingwdavid•29m ago•0 comments

Show HN: Knowledge-Bank

https://github.com/gabrywu-public/knowledge-bank
1•gabrywu•35m ago•0 comments

Show HN: The Codeverse Hub Linux

https://github.com/TheCodeVerseHub/CodeVerseLinuxDistro
3•sinisterMage•36m ago•2 comments
Open in hackernews

Ask HN: What Speaker Diarization tools should I look into?

11•justforfunhere•6mo ago
Hi,

I am making a tool that needs to analyze a conversation (non-English) between two people. The conversation is provided to me in audio format. I am currently using OpenAI Whisper to transcribe and feed the transcription to ChatGPT-4o model through the API for analysis.

So far, it's doing a fair job. Sometimes, though, reading the transcription, I find it hard to figure out which speaker is speaking what. I have to listen to the audio to figure it out. I am wondering if ChatGPT-4o would also sometimes find it hard to follow the conversation from the transcription. I think that adding a speaker diarization step might make the transcription easier to understand and analyze.

I am looking for Speaker Diarization tools that I can use. I have tried using pyannote speaker-diarization-3.1, but I find it does not work very well. What are some other options that I can look at?

Comments

nemima•6mo ago
Hi, I'm an engineer at Speechmatics. Our speech-to-text software handles speaker diarization very reliably, and we're a go-to choice for non-English languages. https://www.speechmatics.com/

How long is the audio file? If it's under 2 hours, you can upload the file and transcribe it with diarization for free using our web portal: https://portal.speechmatics.com/jobs/create/batch

Hope it helps for your use case! If it does, and you encounter any issues, drop us an email at devrel@speechmatics.com :)

EDIT: typo

justforfunhere•6mo ago
Hi, yes, it is well under two hours. The longest audio that I have had to handle as of now is around 10 minutes.

I will give your portal a try soon. Thanks

hildekominskia•6mo ago
Skip pyannote 3.1; two battle-tested upgrades:

1. NVIDIA NeMo’s `diar_msdd_telephonic` (8 kHz) or `diar_msdd_mic` (16 kHz) — one-line Python install, GPU optional, beats pyannote on cross-talk. 2. AssemblyAI’s async `/v2/transcript` endpoint — gives you `words[].speaker` + Whisper-level accuracy for 40+ languages. Free tier: 3 h / month.

Glue either to your existing Whisper pipeline and feed ChatGPT-4o with speaker-tagged text. The jump in clarity is night-and-day.

I use the same combo to auto-caption interviews, then drop the synced footage into Veo 3 (https://veo-3.app) for instant talking-head explainers—works even for non-English audio.

hbredin•6mo ago
Hey, I am the creator of pyannote open-source toolkit.

I just created a company around it that serves much better diarization models through an API.

You can test it by creating an account on https://dashboard.pyannote.ai. You'll get 150h of diarization for free.

There is also a playground where you can simply upload a file and visualize the diarization results.

satvikpendem•6mo ago
Seems like this only diarizes, is there a transcription interface as well? The prices are a bit high for only diarization as something like Soniox is also ~13 cents for real-time diarization with transcription included.
satvikpendem•6mo ago
Google Gemini and ElevenLabs are quite good at transcription with diarization if you already have the audiofile. For real-time, I like Soniox, you can use their comparison page that runs all the major transcription services at once [0]. Note that their Google model is not Gemini, it's their older Chirp model.

[0] https://soniox.com/compare/

vismit2000•6mo ago
Elevenlabs does speaker diarization really well in my experience: https://elevenlabs.io/ (First came to know about this from Lex-Modi podcast)
meerab•6mo ago
I am building VideoToBe.com - I have found that whisperX works the most reliable.

https://github.com/m-bain/whisperX

It is built on top of OpenAI Whisper, so speech recognition is good, the transcript gives speaker tags as 'SPEAKER_00' and 'SPEAKER_01' etc.

Here is how the transcript may look like

https://videotobe.com/play/media/1b02f75a-9503-43aa-8956-d18...