frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Cohere Transcribe: Speech Recognition

https://cohere.com/blog/transcribe
73•gmays•1h ago

Comments

geooff_•1h ago
I can't say enough nice things about Cohere's services. I migrated over to their embedding model a few months ago for clip-style embeddings and it's been fantastic.

It has the most crisp, steady P50 of any external service I've used in a long time.

bluegatty•59m ago
can u comment on overall quality? their models tend to be a bit smaller and less performant overall.
simonw•1h ago
It's great that this is Apache 2.0 licensed - several of Cohere's other models are licensed free for non-commercial use only.
dinakernel•1h ago
My worry is that ASR will end up like OCR. If the multi modal large AI system is good enough (latency wise), the advantage of domain understanding eats the other technlogies alive.

In OCR, even when the characters are poorly scanned, the deep domain understanding these large multi modal AIs have allows it to understand what the document actually meant - this is going to be order id because in the million invoices I have seen before order id is normally below order date - etc. The same issue is going to be there in ASR also is my worry.

progbits•47m ago
This is both good and bad. Good ASR can often understand low quality / garbled speech that I could not figure out, but it also "over corrects" sometimes and replaces correct but low prior words with incorrect but much more common ones.

With OCR the risk is you get another xerox[1] incident where all your data looks plausible but is incorrect. Hope you kept the originals!

(This is why for my personal doc scans, I use OCR only for full text search, but retain the original raw scans forever)

[1] https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres...

nkzd•28m ago
Why are you 'worried' about it? Shouldn't we strive for better technology even if it means some will 'lose'?
yorwba•10m ago
"Better" isn't just about increasing benchmark numbers. Often, it's more important that a system fails safely than how often it fails. Automatic speech recognition that guesses when the input is unclear will occasionally be right and therefore have a lower word error rate, but if it's important that the output be correct, it might be better to insert "[unintelligible]" and have a human double-check.
topazas•59m ago
How hard could it be to train other European language(-s)?
gunalx•42m ago
If you have to ask you dont really need the answer.

Seems to not be to difficult in finding or creating training code. So a pretty decent amount of high quality training data should be many hours. And a few hours in high end data enter GPU compute, and many iterations to get it right.

harvey9•39m ago
It includes several European languages.
stronglikedan•22m ago
hence "other" lol
teach•49m ago
Dumb question, but if this is "open source" is there source code somewhere? Or does that term mean something different in the world of models that must be trained to be useful?
stronglikedan•23m ago
I presume it means the model itself.
Doman•21m ago
Files can be downloaded here: https://huggingface.co/CohereLabs/cohere-transcribe-03-2026/...

And someone has already converted it to onnx format: https://huggingface.co/eschmidbauer/cohere-transcribe-03-202... - so it can be run on CPU instead of GPU.

gruez•47m ago
> Limitations

>Timestamps/Speaker diarization. The model does not feature either of these.

What a shame. Is whisperx still the best choice if you want timestamps/diarization?

akreal•41m ago
WhisperX is not a model but a software package built around Whisper and some other models, including diarization and alignment ones. Something similar will be built around the Cohere Transcribe model, maybe even just an integration to WhisperX itself.
GaggiX•36m ago
There is also: https://github.com/linto-ai/whisper-timestamped

It doesn't use an extra model (so it supports every language that works with Whisper out of the box and use less memory), it works by applying Dynamic Time Warping to cross-attention weights.

bartman•34m ago
Even in the commercial space, there’s a lack of production grade ASR APIs that support diarization and word level timestamps.

My experiences with Google’s Chirp have been horrendous, with it sometimes skipping sections of speech entirely, hallucinating speech where the audio contains noise, and unreliable word level timestamps. And this all is even with using their new audio prefiltering feature.

AWS works slightly better, but also has trouble with keeping word level timestamps in sync.

Whisper is nice but hallucinates regularly.

OpenAI’s new transcription models are delivering accurate output but do not support word level timestamps…

A lot of this could be worked around by sending the resulting transcripts through a few layers of post processing, but… I just want to pay for an API that is reliable and saves me from doing all that work.

Void_•8m ago
Just today I shipped support for this in Whisper Memos: https://whispermemos.com/changelog/2026-04-cohere-transcribe

Accurate and fast model, very happy with it so far!

Claude Code's source code has been leaked via a map file in their NPM registry

https://twitter.com/Fried_rice/status/2038894956459290963
1365•treexs•9h ago•715 comments

Cohere Transcribe: Speech Recognition

https://cohere.com/blog/transcribe
73•gmays•1h ago•19 comments

Axios compromised on NPM – Malicious versions drop remote access trojan

https://www.stepsecurity.io/blog/axios-compromised-on-npm-malicious-versions-drop-remote-access-t...
1646•mtud•15h ago•647 comments

Open source CAD in the browser (Solvespace)

https://solvespace.com/webver.pl
188•phkahler•5h ago•57 comments

The Claude Code Source Leak: fake tools, frustration regexes, undercover mode

https://alex000kim.com/posts/2026-03-31-claude-code-source-leak/
21•alex000kim•5h ago•2 comments

Show HN: Forkrun – NUMA-aware shell parallelizer (50×–400× faster than parallel)

https://github.com/jkool702/forkrun
32•jkool702•4d ago•6 comments

Accidentally created my first fork bomb with Claude Code

https://www.droppedasbaby.com/posts/2602-01/
18•offbyone42•10h ago•1 comments

Good code will still win

https://www.greptile.com/blog/ai-slopware-future
31•dakshgupta•3h ago•47 comments

From 300KB to 69KB per Token: How LLM Architectures Solve the KV Cache Problem

https://news.future-shock.ai/the-weight-of-remembering/
21•future-shock-ai•2d ago•1 comments

GitHub Monaspace Case Study

https://lettermatic.com/custom/monaspace-case-study
68•homebrewer•3h ago•22 comments

Ollama is now powered by MLX on Apple Silicon in preview

https://ollama.com/blog/mlx
564•redundantly•14h ago•285 comments

Combinators

https://tinyapl.rubenverg.com/docs/info/combinators
105•tosh•6h ago•29 comments

Audio tapes reveal mass rule-breaking in Milgram's obedience experiments

https://www.psypost.org/audio-tapes-reveal-mass-rule-breaking-in-milgram-s-obedience-experiments-...
152•lentoutcry•3d ago•91 comments

Oracle slashes 30k jobs

https://rollingout.com/2026/03/31/oracle-slashes-30000-jobs-with-a-cold-6/
643•pje•3h ago•527 comments

RubyGems Fracture Incident Report

https://rubycentral.org/news/rubygems-fracture-incident-report/
55•schneems•4h ago•9 comments

Securing Elliptic Curve Cryptocurrencies Against Quantum Vulnerabilities [pdf]

https://quantumai.google/static/site-assets/downloads/cryptocurrency-whitepaper.pdf
14•jandrewrogers•2h ago•8 comments

Microsoft: Copilot is for entertainment purposes only

https://www.microsoft.com/en-us/microsoft-copilot/for-individuals/termsofuse
226•lpcvoid•3h ago•86 comments

What major works of literature were written after age of 85? 75? 65?

https://statmodeling.stat.columbia.edu/2026/03/25/what-major-works-of-literature-were-written-aft...
91•paulpauper•3d ago•56 comments

Claude Code users hitting usage limits 'way faster than expected'

https://www.theregister.com/2026/03/31/anthropic_claude_code_limits/
164•samizdis•6h ago•119 comments

A Love Letter to 'Girl Games'

https://aftermath.site/a-love-letter-to-girl-games/
95•zdw•5d ago•79 comments

Tell HN: Chrome says "suspicious download" when trying to download yt-dlp

209•joering2•2h ago•64 comments

Show HN: Loreline, narrative language transpiled via Haxe: C++/C#/JS/Java/Py/Lua

https://loreline.app/en/docs/technical-overview/
27•jeremyfa•3d ago•8 comments

Ask HN: Academic study on AI's impact on software development – want to join?

20•research2026•1h ago•6 comments

Scotty: A beautiful SSH task runner

https://freek.dev/3064-scotty-a-beautiful-ssh-task-runner
15•speckx•2h ago•4 comments

Multiple Sclerosis

https://subfictional.com/multiple-sclerosis/
51•luu•4d ago•20 comments

Universal Claude.md – cut Claude output tokens

https://github.com/drona23/claude-token-efficient
422•killme2008•16h ago•152 comments

Google's 200M-parameter time-series foundation model with 16k context

https://github.com/google-research/timesfm
267•codepawl•12h ago•98 comments

RamAIn (YC W26) Is Hiring

https://www.ycombinator.com/companies/ramain/jobs/jezgwo5-ai-ml-research-engineer
1•svee•11h ago

Forth VM and compiler written in C++ and Scryer Prolog

https://github.com/no382001/forth-vm
4•triska•1h ago•0 comments

Good CTE, Bad CTE

https://boringsql.com/posts/good-cte-bad-cte/
149•radimm•1d ago•34 comments