frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: State of the Art of Coding Models, According to Hacker News Commenters

https://hnup.date/hn-sota
17•yunusabd•1h ago
Hello HN,

I was away from my computer for two weeks, and after coming back and reading the latest discussions on HN about coding assistants (models, harnesses), I felt very out of the loop. My normal process would have been to keep reading and figure out the latest and greatest from people's comments, but I wanted to try and automate this process.

Basically the goal is to get a quick overview over which coding models are popular on HN. A next iteration could also scan for harnesses that people use, or info on self-hosting or hardware setups.

I wrote a short intro on the page about the pipeline that collects and analyzes the data, but feel free to ask for more details or check the Google Sheet for more info.

https://hnup.date/hn-sota

Comments

jdw64•1h ago
Interpreting these metrics is quite interesting.

One thing for sure is that while Claude is currently taking the #1 spot in mentions, it carries a lot of negative sentiment due to API pricing policies and frequent server downtime. On the other hand, the runner-up, GPT-5.5, actually seems to have more positive feedback.

Personally, my experience with Codex wasn't as good as with Claude Code (Codex freezes on Windows more often than you'd expect), so this is a bit surprising. That said, the more defensive GPT is definitely better in terms of sheer code-writing capability. However, GPT actually has quite a few issues with text corruption when generating in Korean or Chinese—something English-speaking users probably don't notice. In terms of model capabilities, when given the same agent.md (CLAUDE.md) file, I think GPT is better at writing code, while Claude is better at writing text during code reviews.

Looking at the bottom right, Qwen and DeepSeek are open-source, so they are largely mentioned in the context of guarding against vendor lock-in, which drives positive sentiment. Considering that Hacker News occasionally shows negative sentiment toward China, the fact that they are viewed this positively—unlike US models—shows that being open-source is a massive advantage in itself.

Anyway, one thing for sure is that Gemini is pretty much unusable.

Jabbles•1h ago
Please fix your graph so the names of the models are readable
marcuskaz•1h ago
Also, the stacked graph only allows you to quickly see total mentions, really hard to compare negative or positive sentiment across models at a glance.
yunusabd•19m ago
Yep, a toggle to scale all columns to the same height could solve this. I'll look into it when I do the custom graph
smeej•16m ago
Came here to offer this feedback. If I can't see the name of the model, nothing else in the chart really matters to me. I even tried going to the Google Sheet.

It's way too important a piece of information not to have it visible.

yakkomajuri•1h ago
"Prompts an LLM" -> which LLM?

I saw you're using Gemini for the sentiment rating (which I guess you picked because it's not often mentioned and thus "neutral"? lol)

But would be interesting to get more details overall

yunusabd•24m ago
It's actually ChatGPT at the moment for the first filtering step, for no other reason than having a code snippet ready that I could point Cursor at (I know, so 2025). The Gemini call is using batch processing, so it's handled differently.
ranger_danger•1h ago
Just FYI this article seems to define "start of the art" as "popular", as measured by "total mentions and user sentiment", without any bearing on the technical abilities or actual usage of the model.
mellosouls•49m ago
That's pretty much exactly what the title says.

The technical abilities and usage are derived from the commenters usage reflections.

yunusabd•9m ago
Calling it sota might be a bit provocative, but what actually is the "state of the art"? We have benchmarks, but those are getting increasingly gamed and don't necessarily reflect the actual performance of a model, see Opus 4.7. So I think it's useful to have real world data from actual users as an additional data point.

VS Code inserting 'Co-Authored-by Copilot' into commits regardless of usage

https://github.com/microsoft/vscode/pull/310226
477•indrora•3h ago•217 comments

Six Years Perfecting Maps on WatchOS

https://www.david-smith.org/blog/2026/04/29/maps-on-watchos/
86•valzevul•1h ago•17 comments

This Month in Ladybird - April 2026

https://ladybird.org/newsletter/2026-04-30/
77•richardboegli•2h ago•13 comments

Dav2d

https://code.videolan.org/videolan/dav2d
278•dabinat•5h ago•97 comments

Neanderthals ran 'fat factories' 125,000 years ago

https://www.universiteitleiden.nl/en/news/2025/07/neanderthals-ran-fat-factories-125000-years-ago
54•andsoitis•2h ago•11 comments

The agent harness belongs outside the sandbox

https://www.mendral.com/blog/agent-harness-belongs-outside-sandbox
32•shad42•1h ago•25 comments

Do_not_track

https://donottrack.sh/
148•RubyGuy•5h ago•55 comments

Clojurists Together – Q2 2026 Open Source Funding Announcement

https://www.clojuriststogether.org/news/q2-2026-funding-announcement/
17•dragandj•1h ago•2 comments

Inventions for battery reuse and recycling increase seven-fold in last decade

https://www.epo.org/en/news-events/news/inventions-battery-reuse-and-recycling-increase-more-seve...
134•JeanKage•2d ago•6 comments

Little Magazines Are Back

https://wsjfreeexpression.substack.com/p/little-magazines-are-back
58•prismatic•2d ago•10 comments

Show HN: State of the Art of Coding Models, According to Hacker News Commenters

https://hnup.date/hn-sota
17•yunusabd•1h ago•10 comments

NetHack 5.0.0

https://nethack.org/v500/release.html
327•rsaarelm•5h ago•95 comments

A Physics Engine with Incremental Rollback for Multiplayer Games

https://easel.games/blog/2026-rollback-physics
20•BSTRhino•23h ago•12 comments

Why does it take so long to release black fan versions?

https://www.noctua.at/en/expertise/blog/how-can-it-take-so-long-to-release-black-fan-versions
679•buildbot•18h ago•281 comments

How fast is a macOS VM, and how small could it be?

https://eclecticlight.co/2026/05/02/how-fast-is-a-macos-vm-and-how-small-could-it-be/
216•moosia•13h ago•77 comments

California to begin ticketing driverless cars that violate traffic laws

https://www.bbc.com/news/articles/clypjx3rg2go
205•geox•5h ago•216 comments

Barman – Backup and Recovery Manager for PostgreSQL

https://github.com/EnterpriseDB/barman
131•nateb2022•3d ago•22 comments

Welcome to Hell Developer

https://noahclements.com/Wahoo-Bolt-Hidden-Debug-Mode/
44•denysvitali•5h ago•21 comments

Roblox shares plummet 18% as child safety measures weigh on bookings

https://www.cnbc.com/2026/05/01/roblox-rblx-stock-child-safety-earnings.html
159•1vuio0pswjnm7•6h ago•99 comments

Refusal in Language Models Is Mediated by a Single Direction

https://arxiv.org/abs/2406.11717
85•fagnerbrack•9h ago•32 comments

Flue is a TypeScript framework for building the next generation of agents

https://flueframework.com/
75•momentmaker•5h ago•41 comments

Modern C++ Programming: Busato

https://github.com/federico-busato/Modern-CPP-Programming
43•KnuthIsGod•6h ago•4 comments

The USB Situation

https://randsinrepose.com/archives/the-usb-situation/
74•herbertl•3d ago•85 comments

Uber wants to turn its drivers into a sensor grid for self-driving companies

https://techcrunch.com/2026/05/01/uber-wants-to-turn-its-millions-of-drivers-into-a-sensor-grid-f...
112•nickvec•7h ago•122 comments

Why are there both TMP and TEMP environment variables? (2015)

https://devblogs.microsoft.com/oldnewthing/20150417-00/?p=44213
185•ankitg12•14h ago•85 comments

Dotcl: Common Lisp Implementation on .NET

https://github.com/dotcl/dotcl
144•reikonomusha•2d ago•32 comments

Open Design: Use Your Coding Agent as a Design Engine

https://github.com/nexu-io/open-design
158•steveharing1•10h ago•82 comments

Show HN: Pollen – distributed WASM runtime, no control plane, single binary

https://github.com/sambigeara/pollen
104•sambigeara•2d ago•42 comments

Using group theory to explore the space of positional encodings for attention

https://blog.janestreet.com/using-group-theory-to-explore-positional-encodings-attention/
5•jxmorris12•1d ago•0 comments

Artemis II Photo Timeline

https://artemistimeline.com/#artemis-ii-walkout-nhq202604010003
353•geerlingguy•3d ago•26 comments