frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Discuss – Do AI agents deserve all the hype they are getting?

2•MicroWagie•1h ago•0 comments

Ask HN: Anyone Using a Mac Studio for Local AI/LLM?

47•UmYeahNo•1d ago•29 comments

LLMs are powerful, but enterprises are deterministic by nature

3•prateekdalal•5h ago•3 comments

Ask HN: Non AI-obsessed tech forums

27•nanocat•16h ago•21 comments

Ask HN: Ideas for small ways to make the world a better place

16•jlmcgraw•18h ago•20 comments

Ask HN: 10 months since the Llama-4 release: what happened to Meta AI?

44•Invictus0•1d ago•11 comments

Ask HN: Who wants to be hired? (February 2026)

139•whoishiring•4d ago•517 comments

Ask HN: Who is hiring? (February 2026)

313•whoishiring•4d ago•512 comments

Ask HN: Non-profit, volunteers run org needs CRM. Is Odoo Community a good sol.?

2•netfortius•13h ago•1 comments

AI Regex Scientist: A self-improving regex solver

7•PranoyP•20h ago•1 comments

Tell HN: Another round of Zendesk email spam

104•Philpax•2d ago•54 comments

Ask HN: Is Connecting via SSH Risky?

19•atrevbot•2d ago•37 comments

Ask HN: Has your whole engineering team gone big into AI coding? How's it going?

18•jchung•2d ago•13 comments

Ask HN: Why LLM providers sell access instead of consulting services?

5•pera•1d ago•13 comments

Ask HN: How does ChatGPT decide which websites to recommend?

5•nworley•1d ago•11 comments

Ask HN: What is the most complicated Algorithm you came up with yourself?

3•meffmadd•1d ago•7 comments

Ask HN: Is it just me or are most businesses insane?

8•justenough•1d ago•7 comments

Ask HN: Mem0 stores memories, but doesn't learn user patterns

9•fliellerjulian•2d ago•6 comments

Ask HN: Is there anyone here who still uses slide rules?

123•blenderob•4d ago•122 comments

Kernighan on Programming

170•chrisjj•4d ago•61 comments

Ask HN: Any International Job Boards for International Workers?

2•15charslong•16h ago•2 comments

Ask HN: Anyone Seeing YT ads related to chats on ChatGPT?

2•guhsnamih•1d ago•4 comments

Ask HN: Does global decoupling from the USA signal comeback of the desktop app?

5•wewewedxfgdf•1d ago•3 comments

We built a serverless GPU inference platform with predictable latency

5•QubridAI•2d ago•1 comments

Ask HN: Does a good "read it later" app exist?

8•buchanae•3d ago•18 comments

Ask HN: Have you been fired because of AI?

17•s-stude•4d ago•15 comments

Ask HN: How Did You Validate?

4•haute_cuisine•1d ago•6 comments

Ask HN: Anyone have a "sovereign" solution for phone calls?

12•kldg•4d ago•1 comments

Ask HN: Cheap laptop for Linux without GUI (for writing)

15•locusofself•3d ago•16 comments

Ask HN: OpenClaw users, what is your token spend?

14•8cvor6j844qw_d6•4d ago•6 comments
Open in hackernews

Ask HN: What Speaker Diarization tools should I look into?

11•justforfunhere•6mo ago
Hi,

I am making a tool that needs to analyze a conversation (non-English) between two people. The conversation is provided to me in audio format. I am currently using OpenAI Whisper to transcribe and feed the transcription to ChatGPT-4o model through the API for analysis.

So far, it's doing a fair job. Sometimes, though, reading the transcription, I find it hard to figure out which speaker is speaking what. I have to listen to the audio to figure it out. I am wondering if ChatGPT-4o would also sometimes find it hard to follow the conversation from the transcription. I think that adding a speaker diarization step might make the transcription easier to understand and analyze.

I am looking for Speaker Diarization tools that I can use. I have tried using pyannote speaker-diarization-3.1, but I find it does not work very well. What are some other options that I can look at?

Comments

nemima•6mo ago
Hi, I'm an engineer at Speechmatics. Our speech-to-text software handles speaker diarization very reliably, and we're a go-to choice for non-English languages. https://www.speechmatics.com/

How long is the audio file? If it's under 2 hours, you can upload the file and transcribe it with diarization for free using our web portal: https://portal.speechmatics.com/jobs/create/batch

Hope it helps for your use case! If it does, and you encounter any issues, drop us an email at devrel@speechmatics.com :)

EDIT: typo

justforfunhere•6mo ago
Hi, yes, it is well under two hours. The longest audio that I have had to handle as of now is around 10 minutes.

I will give your portal a try soon. Thanks

hildekominskia•6mo ago
Skip pyannote 3.1; two battle-tested upgrades:

1. NVIDIA NeMo’s `diar_msdd_telephonic` (8 kHz) or `diar_msdd_mic` (16 kHz) — one-line Python install, GPU optional, beats pyannote on cross-talk. 2. AssemblyAI’s async `/v2/transcript` endpoint — gives you `words[].speaker` + Whisper-level accuracy for 40+ languages. Free tier: 3 h / month.

Glue either to your existing Whisper pipeline and feed ChatGPT-4o with speaker-tagged text. The jump in clarity is night-and-day.

I use the same combo to auto-caption interviews, then drop the synced footage into Veo 3 (https://veo-3.app) for instant talking-head explainers—works even for non-English audio.

hbredin•6mo ago
Hey, I am the creator of pyannote open-source toolkit.

I just created a company around it that serves much better diarization models through an API.

You can test it by creating an account on https://dashboard.pyannote.ai. You'll get 150h of diarization for free.

There is also a playground where you can simply upload a file and visualize the diarization results.

satvikpendem•6mo ago
Seems like this only diarizes, is there a transcription interface as well? The prices are a bit high for only diarization as something like Soniox is also ~13 cents for real-time diarization with transcription included.
satvikpendem•6mo ago
Google Gemini and ElevenLabs are quite good at transcription with diarization if you already have the audiofile. For real-time, I like Soniox, you can use their comparison page that runs all the major transcription services at once [0]. Note that their Google model is not Gemini, it's their older Chirp model.

[0] https://soniox.com/compare/

vismit2000•6mo ago
Elevenlabs does speaker diarization really well in my experience: https://elevenlabs.io/ (First came to know about this from Lex-Modi podcast)
meerab•6mo ago
I am building VideoToBe.com - I have found that whisperX works the most reliable.

https://github.com/m-bain/whisperX

It is built on top of OpenAI Whisper, so speech recognition is good, the transcript gives speaker tags as 'SPEAKER_00' and 'SPEAKER_01' etc.

Here is how the transcript may look like

https://videotobe.com/play/media/1b02f75a-9503-43aa-8956-d18...