frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Gemini 3

https://blog.google/products/gemini/gemini-3/
817•preek•6h ago•583 comments

Google Antigravity

https://antigravity.google/
473•Fysi•5h ago•569 comments

GitHub: Git Operation Failures

https://www.githubstatus.com/incidents/5q7nmlxz30sk
183•wilhelmklopp•42m ago•84 comments

Pebble, Rebble, and a path forward

https://ericmigi.com/blog/pebble-rebble-and-a-path-forward/
218•phoronixrly•3h ago•87 comments

I am stepping down as the CEO of Mastodon

https://blog.joinmastodon.org/2025/11/my-next-chapter-with-mastodon/
120•Tomte•3h ago•37 comments

The Final Straw: Why Companies Replace Once-Beloved Technology Brands

https://www.functionize.com/blog/the-final-straw-why-companies-replace-once-beloved-technology-br...
19•ohjeez•1h ago•4 comments

GitHub Down

89•mikeocool•41m ago•30 comments

OrthoRoute – GPU-accelerated autorouting for KiCad

https://bbenchoff.github.io/pages/OrthoRoute.html
34•wanderingjew•2h ago•6 comments

Cloudflare Global Network experiencing issues

https://www.cloudflarestatus.com/incidents/8gmgl950y3h7
2223•imdsm•9h ago•1416 comments

Gemini 3 Pro Model Card [pdf]

https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-Pro-Model-Card.pdf
64•virgildotcodes•10h ago•294 comments

The code and open-source tools I used to produce a science fiction anthology

https://compellingsciencefiction.com/posts/the-code-and-open-source-tools-i-used-to-produce-a-sci...
7•mojoe•5h ago•0 comments

Show HN: Guts – convert Golang types to TypeScript

https://github.com/coder/guts
57•emyrk•3h ago•14 comments

Solving a million-step LLM task with zero errors

https://arxiv.org/abs/2511.09030
85•Anon84•4h ago•35 comments

How Quake.exe got its TCP/IP stack

https://fabiensanglard.net/quake_chunnel/index.html
424•billiob•13h ago•102 comments

Show HN: RowboatX – open-source Claude Code for everyday automations

https://github.com/rowboatlabs/rowboat
24•segmenta•2h ago•4 comments

Chuck Moore: Colorforth has stopped working [video]

https://www.youtube.com/watch?v=MvkGBWXb2oQ#t=22
20•netten•1d ago•2 comments

Oracle is underwater on its 'astonishing' $300B OpenAI deal

https://www.ft.com/content/064bbca0-1cb2-45ab-85f4-25fdfc318d89
61•busymom0•53m ago•10 comments

Trying out Gemini 3 Pro with audio transcription and a new pelican benchmark

https://simonwillison.net/2025/Nov/18/gemini-3/
57•nabla9•2h ago•25 comments

Mysterious holes in the Andes may have been an ancient marketplace

https://www.sydney.edu.au/news-opinion/news/2025/11/10/mysterious-holes-in-the-andes-may-have-bee...
5•gmays•6d ago•0 comments

Strix Halo's Memory Subsystem: Tackling iGPU Challenges

https://chipsandcheese.com/p/strix-halos-memory-subsystem-tackling
49•PaulHoule•4h ago•22 comments

Short Little Difficult Books

https://countercraft.substack.com/p/short-little-difficult-books
121•crescit_eundo•6h ago•75 comments

When 1+1+1 Equals 1

https://mathenchant.wordpress.com/2024/12/19/when-111-equals-1/
22•surprisetalk•5d ago•8 comments

A 'small' vanilla Kubernetes install on NixOS

https://stephank.nl/p/2025-11-17-a-small-vanilla-kubernetes-install-on-nixos.html
9•todsacerdoti•10h ago•2 comments

Nearly all UK drivers say headlights are too bright

https://www.bbc.com/news/articles/c1j8ewy1p86o
575•YeGoblynQueenne•7h ago•588 comments

Show HN: Tokenflood – simulate arbitrary loads on instruction-tuned LLMs

https://github.com/twerkmeister/tokenflood
9•twerkmeister•6d ago•0 comments

Google boss says AI investment boom has 'elements of irrationality'

https://www.bbc.com/news/articles/cwy7vrd8k4eo
80•jillesvangurp•15h ago•167 comments

Experiment: Making TypeScript immutable-by-default

https://evanhahn.com/typescript-immutability-experiment/
78•ingve•7h ago•65 comments

The Miracle of Wörgl

https://scf.green/story-of-worgl-and-others/
120•simonebrunozzi•10h ago•63 comments

Mathematics and Computation (2019) [pdf]

https://www.math.ias.edu/files/Book-online-Aug0619.pdf
59•nill0•8h ago•13 comments

A day at Hetzner Online in the Falkenstein data center

https://www.igorslab.de/en/a-day-at-hetzner-online-in-the-falkenstein-data-center-insights-into-s...
149•speckx•5h ago•59 comments
Open in hackernews

Trying out Gemini 3 Pro with audio transcription and a new pelican benchmark

https://simonwillison.net/2025/Nov/18/gemini-3/
57•nabla9•2h ago

Comments

simonw•1h ago
The audio transcript exercise here is particularly interesting from a journalism perspective.

Summarizing a 3.5 hour council meeting is something of a holy grail of AI-assisted reporting. There are a LOT of meetings like that, and newspapers (especially smaller ones) can no longer afford to have a human reporter sit through them all.

I tried this prompt (against audio from https://www.youtube.com/watch?v=qgJ7x7R6gy0):

  Output a Markdown transcript of this meeting. Include speaker
  names and timestamps. Start with an outline of the key
  meeting sections, each with a title and summary and timestamp
  and list of participating names. Note in bold if anyone
  raised their voices, interrupted each other or had
  disagreements. Then follow with the full transcript.
Here's the result: https://gist.github.com/simonw/0b7bc23adb6698f376aebfd700943...

I'm not sure quite how to grade it here, especially since I haven't sat through the whole 3.5 hour meeting video myself.

It appears to have captured the gist of the meeting very well, but the fact that the transcript isn't close to an exact match to what was said - and the timestamps are incorrect - means it's very hard to trust the output. Could it have hallucinated things that didn't happen? Those can at least be spotted by digging into the video (or the YouTube transcript) to check that they occurred... but what about if there was a key point that Gemini 3 omitted entirely?

WesleyLivesay•1h ago
I think it appears to have done a good job of summarizing the points that it summarize, at least judging from my quick watch of a few sections and from the YT Transcript (which seems quite accurate).

Almost makes me wonder if it is behind the scenes doing something similar to: rough transcript -> Summaries -> transcript with timecodes (runs out of context) -> throws timestamps that it has on summaries.

I would be very curious to see if it does better on something like an hour long chunk of audio, to see if it is just some sort of context issue. Or if this same audio was fed to it in say 45 minute chunks to see if the timestamps fix themselves.

byt3bl33d3r•1h ago
I’ve been meaning to create & publish a structured extraction benchmark for a while. Using LLMs to extract info/entities/connections from large amounts of unstructured data is also a huge boon to AI-assisted reporting and has also a number of cybersecurity applications. Gemini 2.5 was pretty good but so far I have yet to see an LLM that can reliably , accurately and consistently do this
simonw•1h ago
This would be extremely useful. I think this is one of the most commercially valuable uses of these kinds of models, having more solid independent benchmarks would be great.
mistercheph•1h ago
For this use case I think best bet is still a toolchain with a transcription model like whisper fed into an LLM to summarize
simonw•1h ago
Yeah I agree. I ran Whisper (via MacWhisper) on the same video and got back accurate timestamps.

The big benefit of Gemini for this is that it appears to do a great job of speaker recognition, plus it can identify when people interrupt each other or raise their voices.

The best solution would likely include a mixture of both - Gemini for the speaker identification and tone-of-voice stuff, Whisper or NVIDIA Parakeet or similar for the transcription with timestamps.

rahimnathwani•1h ago
For this use case, why not use Whisper to transcribe the audio, and then an LLM to do a second step (summarization or answering questions or whatever)?

If you need diarization, you can use something like https://github.com/m-bain/whisperX

pants2•1h ago
Whisper simply isn't very good compared to LLM audio transcription like gpt-4o-transcribe. If Gemini 3 is even better it's a game-changer.
crazysim•59m ago
Since Gemini seems to be sucking at timestamps, perhaps Whisper can be used to help ground that as an additional input alongside the audio.
ks2048•1h ago
Does anyone benchmark these models for text-to-speech using traditional word-error-rates? It seems audio-input Gemini is a lot cheaper than Google Speech-to-text.
simonw•1h ago
Here's one: https://voicewriter.io/speech-recognition-leaderboard

"Real-World Speech-to-text API Leaderboard" - it includes scores for Gemini 2.5 Pro and Flash.

Workaccount2•1h ago
My assumption is that Gemini has no insight into the time stamps, and instead is ballparking it based on how much context has been analyzed up to that point.

I wonder if you put the audio into a video that is nothing but a black screen with a timer running, it would be able to correctly timestamp.

simonw•1h ago
The Gemini documentation specifically mentions timestamp awareness here: https://ai.google.dev/gemini-api/docs/audio
minimaxir•54m ago
Per the docs, Gemini represents each second of audio as 32 tokens. Since it's a consistent amount, as long as the model is trained to understand the relation between timestamps and the number of tokens (which per Simon's link it does), it should be able to infer the correct amount of seconds.
potatolicious•44m ago
You really want to break a task like this down to constituent parts - especially because in this case the "end to end" way of doing it (i.e., raw audio to summary) doesn't actually get you anything.

IMO the right way to do this is to feed the audio into a transcription model, specifically one that supports diarization (separation of multiple speakers). This will give you a high quality raw transcript that is pretty much exactly what was actually said.

It would be rough in places (i.e., Speaker 1, Speaker 2, etc. rather than actual speaker names)

Then you want to post-process with a LLM to re-annotate the transcript and clean it up (e.g., replace "Speaker 1" with "Mayor Bob"), and query against it.

I see another post here complaining that direct-to-LLM beats a transcription model like Whisper - I would challenge that. Any modern ASR model will do a very, very good job with 95%+ accuracy.

simonw•23m ago
Which diarization models would you recommend, especially for running on macOS?

(Update: I just updated MacWhisper and it can now run Parakeet which appears to have decent diarization built in, screenshot here: https://static.simonwillison.net/static/2025/macwhisper-para... )

darkwater•16m ago
Why can't Gemini, the product, do that by itself? Isn't the point of all this AI hype to easily automate things with low effort?
sillyfluke•13m ago
I'm curious when we started conflating transcription and summarization when discussing this LLM mess, or maybe I'm confused about the output simonw is quoting as "the transcript" which starts off not with the actual transcript but with a Meeting Outline and Summarization sections?

LLM summarization is utterly useless when you want 100% accuracy on the final binding decisions on things like council meeting decisions. My experience has been that LLMs cannot be trusted to follow convulted discussions, including revisting earlier agenda items later in the meeting etc.

With transcriptions, the catastrophic risk is far less since I'm doing the summarizing from a transcript myself. But in that case, for an auto-generated transcript, I'll take correct timestamps with gibberish sounding sentences over incorrect timestamps with "convincing" sounding but halluncinated sentences any day.

Any LLM summarization of a sufficiently important meeting requires second-by-second human verification of the audio recording. I have yet to see this convincingly refuted (ie, an LLM model that maintains 100% accuracy on summarizing meeting decisions consistently).

simonw•12m ago
That's why I shared these results. Understanding the difference between LLM summarization and exact transcriptions is really important for this kind of activity.
londons_explore•1h ago
Anyone got a class full of students and able to get a human version of this pelican benchmark?

Perhaps half with a web browser to view the results, and half working blind with the numbers alone?

ZeroConcerns•56m ago
> so I shrunk the file down to a more manageable 38MB using ffmpeg

Without having an LLM figure out the required command line parameters? Mad props!

simonw•22m ago
Hah, nope! I had Claude Code figure that one out.
leetharris•33m ago
I used to work in ASR. Due to the nature of current multimodal architectures, it is unlikely we'll ever see accurate timestamps over a longer horizon. You're better off using encoder-decoder ASR architectures, then using traditional diarization using embedding clustering, then using a multimodal model to refine it, then use a forced alignment technique (maybe even something pre-NN) to get proper timestamps and reconciling it at the end.

These things are getting really good at just regular transcription (as long as you don't care about verbatimicity), but every additional dimension you add (timestamps, speaker assignment, etc) will make the others worse. These work much better as independent processes that then get reconciled and refined by a multimodal LLM.

nurumaik•13m ago
Seems like pelican benchmark is finally added to model training process
Wowfunhappy•4m ago
Aww, I don’t like the new pelican benchmark as much. I liked that the old prompt was vague and we could see how the AI interpreted it.