frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: State of the Art of Coding Models, According to Hacker News Commenters

https://hnup.date/hn-sota
23•yunusabd•2h ago
Hello HN,

I was away from my computer for two weeks, and after coming back and reading the latest discussions on HN about coding assistants (models, harnesses), I felt very out of the loop. My normal process would have been to keep reading and figure out the latest and greatest from people's comments, but I wanted to try and automate this process.

Basically the goal is to get a quick overview over which coding models are popular on HN. A next iteration could also scan for harnesses that people use, or info on self-hosting or hardware setups.

I wrote a short intro on the page about the pipeline that collects and analyzes the data, but feel free to ask for more details or check the Google Sheet for more info.

https://hnup.date/hn-sota

Comments

jdw64•2h ago
Interpreting these metrics is quite interesting.

One thing for sure is that while Claude is currently taking the #1 spot in mentions, it carries a lot of negative sentiment due to API pricing policies and frequent server downtime. On the other hand, the runner-up, GPT-5.5, actually seems to have more positive feedback.

Personally, my experience with Codex wasn't as good as with Claude Code (Codex freezes on Windows more often than you'd expect), so this is a bit surprising. That said, the more defensive GPT is definitely better in terms of sheer code-writing capability. However, GPT actually has quite a few issues with text corruption when generating in Korean or Chinese—something English-speaking users probably don't notice. In terms of model capabilities, when given the same agent.md (CLAUDE.md) file, I think GPT is better at writing code, while Claude is better at writing text during code reviews.

Looking at the bottom right, Qwen and DeepSeek are open-source, so they are largely mentioned in the context of guarding against vendor lock-in, which drives positive sentiment. Considering that Hacker News occasionally shows negative sentiment toward China, the fact that they are viewed this positively—unlike US models—shows that being open-source is a massive advantage in itself.

Anyway, one thing for sure is that Gemini is pretty much unusable.

Jabbles•1h ago
Please fix your graph so the names of the models are readable
marcuskaz•1h ago
Also, the stacked graph only allows you to quickly see total mentions, really hard to compare negative or positive sentiment across models at a glance.
yunusabd•53m ago
Yep, a toggle to scale all columns to the same height could solve this. I'll look into it when I do the custom graph
smeej•50m ago
Came here to offer this feedback. If I can't see the name of the model, nothing else in the chart really matters to me. I even tried going to the Google Sheet.

It's way too important a piece of information not to have it visible.

yakkomajuri•1h ago
"Prompts an LLM" -> which LLM?

I saw you're using Gemini for the sentiment rating (which I guess you picked because it's not often mentioned and thus "neutral"? lol)

But would be interesting to get more details overall

yunusabd•58m ago
It's actually ChatGPT at the moment for the first filtering step, for no other reason than having a code snippet ready that I could point Cursor at (I know, so 2025). The Gemini call is using batch processing, so it's handled differently.
ranger_danger•1h ago
Just FYI this article seems to define "start of the art" as "popular", as measured by "total mentions and user sentiment", without any bearing on the technical abilities or actual usage of the model.
mellosouls•1h ago
That's pretty much exactly what the title says.

The technical abilities and usage are derived from the commenters usage reflections.

yunusabd•43m ago
Calling it sota might be a bit provocative, but what actually is the "state of the art"? We have benchmarks, but those are getting increasingly gamed and don't necessarily reflect the actual performance of a model, see Opus 4.7. So I think it's useful to have real world data from actual users as an additional data point.
brooksc•26m ago
It'd be interesting to also graph this over time to see how sentiment changes from when a model is released to today.
pbgcp2026•19m ago
So, it's a webpage with 3 paragraphs and a simple chart. It has: 1) terrible color scheme – fine, I switch to reader mode 2) shitloads of JS - fine, NoScript works, page breaks 3) Fancy "design" with simple graph but unreadable X axis labels - fine, I can use screen zoom for that ... to see 3x "Claude O..." LOL are we playing guess-me-over game? 4) ... "LxxxLxxx - Learn languages with YouTube!"

Show HN: State of the Art of Coding Models, According to Hacker News Commenters

https://hnup.date/hn-sota
23•yunusabd•2h ago•12 comments

Show HN: Pollen – distributed WASM runtime, no control plane, single binary

https://github.com/sambigeara/pollen
106•sambigeara•2d ago•42 comments

Show HN: DAC – open-source dashboard as code tool for agents and humans

https://github.com/bruin-data/dac
91•karakanb•3d ago•31 comments

Show HN: Rust library for Undo/Redo using deltas, snapshots or commands

https://github.com/mikwielgus/undoredo
11•mikolajw•5h ago•1 comments

Show HN: Mljar Studio – local AI data analyst that saves analysis as notebooks

https://mljar.com/
60•pplonski86•13h ago•10 comments

Show HN: Piruetas – A self-hosted diary app I built for my girlfriend

https://piruet.app
55•patillacode•13h ago•45 comments

Show HN: Filling PDF forms with AI using client-side tool calling

https://copilot.simplepdf.com/?share=a7d00ad073c75a75d493228e6ff7b11eb3f2d945b6175913e87898ec96ca...
48•nip•14h ago•23 comments

Show HN: Browser-based light pollution simulator using real photometric data

https://iesna.eu/?wasm=skyglow_demo
36•holg•14h ago•11 comments

Show HN: Large Scale Article Extract of Newspapers 1730s-1960s

https://snewpapers.com/
45•brettnbutter•15h ago•18 comments

Show HN: Clipmon is a macOS clipboard manager on steroids

https://github.com/c9-labs/clipmon
4•vednig•3h ago•0 comments

Show HN: Stop playing my matchstick puzzles, start building your own in seconds

https://mathstick.github.io
32•trangram•18h ago•23 comments

Show HN: AI CAD Harness

https://fusion.adam.new/install
92•zachdive•1d ago•89 comments

Show HN: WhatCable, a tiny menu bar app for inspecting USB-C cables

https://github.com/darrylmorley/whatcable
527•sleepingNomad•1d ago•160 comments

Show HN: SimDrive – a browser racing game with your phone as the controller:D

https://simdrive.xyz/
17•1000xcat•3d ago•9 comments

Show HN: Site Mogging

https://sitemogging.com
65•jilles•1d ago•73 comments

Show HN: Loopsy, a way for terminals and AI agents on different machines to talk

https://github.com/leox255/loopsy
54•todience•1d ago•10 comments

Show HN: Agent-desktop – Native desktop automation CLI for AI agents

https://github.com/lahfir/agent-desktop
91•lahfir•21h ago•34 comments

Show HN: Perfect Bluetooth MIDI for Windows

103•mayerwin•1d ago•32 comments

Show HN: AgInTiFlow, a local web and CLI agent workspace using DeepSeek

https://www.npmjs.com/package/@lazyingart/agintiflow
3•lachlanchen•10h ago•0 comments

Show HN: My Private GitHub on Postgres

https://github.com/calebwin/gitgres
41•calebhwin•1d ago•23 comments

Show HN: Which public repos are friendliest to an AI coding agent?

https://www.agentfriendlycode.com/
5•hsnice16•4h ago•0 comments

Show HN: GhostBox – Borrow a disposable little machine from the Global Free Tier

https://www.ghost.charity/
121•keepamovin•1d ago•99 comments

Show HN: A new benchmark for testing LLMs for deterministic outputs

https://interfaze.ai/blog/introducing-structured-output-benchmark
59•khurdula•3d ago•28 comments

Show HN: Pu.sh – a full coding-agent harness in 400 lines of shell

https://pu.dev/
89•nahimn•2d ago•26 comments

Show HN: I built Male Hormone Lab Interpreter that does what LLMs can't

https://www.longevity-tools.com/male-hormones-interpreter
2•zsolt224•14h ago•0 comments

Show HN: Shutt – Turn Strava activities into shareable photo/video posts

https://shutt.run
2•zzarcon•14h ago•0 comments

Show HN: Drive any macOS app in the background without stealing the cursor

https://github.com/trycua/cua
188•frabonacci•4d ago•41 comments

Show HN: Winpodx – run Windows apps on Linux as native windows

https://github.com/kernalix7/winpodx
96•kernalix7•1d ago•47 comments

Show HN: Live Sun and Moon Dashboard with NASA Footage

https://www.lumara-space.app/
222•beeswaxpat•4d ago•68 comments

Show HN: Rocky – Rust SQL engine with branches, replay, column lineage

https://github.com/rocky-data/rocky
120•hugocorreia90•4d ago•48 comments