frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

15× vs. ~1.37×: Recalculating GPT-5.3-Codex-Spark on SWE-Bench Pro

https://twitter.com/nvanlandschoot/status/2022385829596078100
27•nvanlandschoot•1d ago

Comments

nvanlandschoot•1d ago
Method: I used OpenAI’s published SWE-Bench Pro chart points and matched GPT-5.3-Codex-Spark to the baseline model at comparable accuracy levels by reasoning effort. At similar accuracy, the effective speedup is closer to ~1.37× rather than 15×.
solarkraft•1d ago
> The narrative from AI companies hasn’t really changed, but the reaction has. The same claims get repeated so often that they start to feel like baseline reality, and people begin to assume the models are far more capable than they actually are.

This has been the case for people who buy into hype and don’t actually use the products, but I’m pretty sure people who do are pretty disillusioned by all the claims. The only somewhat reliable method is to test the things for your own use case.

That said: I always expected the tradeoff of Spark to be accuracy vs. speed. That it’s still significantly faster at the same accuracy is wild. I never expected that.

ijidak•1h ago
I believe a lot of the speed-up is due to a new chip they use [1] so the fact that the speedup didn't reduce the number of operations is likely why the accuracy has changed little.

1. https://www.cerebras.ai/blog/openai-codexspark

roxolotl•51m ago
The people I know that use them the most also seem the most likely to buy into hype. The coworker who no longer answers questions by talking about code but instead by talking about which skills are the best is the same who posts all the hype.
pennaMan•1h ago
efficiency per token has tanked but it's still faster. given this is the first generation for Cerberas hardware this is the worst it's ever going to be.

when it reaches the main 5.3 codex efficiency at this token rate this kind of articles will seem silly in retrospect

charcircuit•1h ago
>The fair comparison is where the models are basically equivalent in intelligence

I don't agree with this premise. I think it is fair to say that Haiku is a faster model than Opus.

jiggawatts•1h ago
Something I find odd in the AI space is that almost all journalists republish vendor benchmark claims without question.

Why not just benchmark the models yourself?

Tiny little YouTube channels will spend weeks benchmarking every motherboard from every manufacturer to detect even the tiniest differences!

Car reviews will often test drive the cars and run their own dyno tests.

Etc…

AI reviews meanwhile are just copy-paste from the market blurb.

coldtea•1h ago
>Why not just benchmark the models yourself?

Because their incentives are to churn stupid articles fast to get more views, and to be on major AI companies and potential advertisers' good graces. That, and their integrity and passion for what they do is minimal, plus they're paid peanuts.

Doesn't help that most brain-rotted readers are hardly calling them out for it, if they even notice it.

latchkey•1h ago
Even the 3rd party AI benchmarks that are published [0], are all sham too. It is run by a paid shill (semianalysis) and all highly tuned by the vendors to make themselves look good.

[0] https://github.com/InferenceMAX/InferenceMAX/

CamouflagedKiwi•51m ago
It's not free to run those benchmarks, especially on the big models.

Ideally journalists / their employers would swallow that as the cost of business, but it's a hard sell if they are feeling the squeeze and aren't making much in the first place.

vessenes•1h ago
This is the best sort of correct, in that it’s technically correct. The thing is, We don’t need 5.3 xxhigh reasoning for everything. Giving up some intelligence, and then taking the hit on some inevitable re-runs / re-prompts at 15x ends up with, I bet, more than 37% speed improvement on a lot of tasks.

There’s two ways to run this, and I’m curious which is better (time or quality, either would be interesting) - you could run 5.3xxhigh as the coordinator, spinning up some eager beaver coders that need wrangling, or you could run spark as the coordinator and probably code drafter - where it runs into trouble it could farm out to the big brains.

Now that I think about it, corporations use both models as well. It would be nice for the user if fast coordinator worked well; that lowers turns and ultimately could let you stay in the zone while pairing with a coding agent. But I really don’t know which is better.

nearbuy•50m ago
Unless I'm missing it, the page they're referring to (https://openai.com/index/introducing-gpt-5-3-codex-spark/) never claims Spark is 15x faster.

It looks like it only appears in the snippet the Google result shows, presumably taken from the meta tags. It's possible an earlier draft claimed a 15x speed boost and they forgot to remove the claim from the tags.

nvanlandschoot•13m ago
I think they modified the page. If you search for GPT-5.3-Codex-Spark, Google still has it indexed with 15x. Searching: GPT-5.3-Codex-Spark + "15x" will show all the downstream sites that picked up the claim.

News publishers limit Internet Archive access due to AI scraping concerns

https://www.niemanlab.org/2026/01/news-publishers-limit-internet-archive-access-due-to-ai-scrapin...
292•ninjagoo•4h ago•178 comments

uBlock filter list to hide all YouTube Shorts

https://github.com/i5heu/ublock-hide-yt-shorts/
390•i5heu•5h ago•144 comments

My smart sleep mask broadcasts users' brainwaves to an open MQTT broker

https://aimilios.bearblog.dev/reverse-engineering-sleep-mask/
297•minimalthinker•7h ago•140 comments

Ooh.directory: a place to find good blogs that interest you

https://ooh.directory/
389•hisamafahri•9h ago•111 comments

Zvec: A lightweight, fast, in-process vector database

https://github.com/alibaba/zvec
23•dvrp•1d ago•2 comments

Breaking the spell of vibe coding

https://www.fast.ai/posts/2026-01-28-dark-flow/
96•arjunbanker•1d ago•53 comments

IBM tripling entry-level jobs after finding the limits of AI adoption

https://fortune.com/2026/02/13/tech-giant-ibm-tripling-gen-z-entry-level-hiring-according-to-chro...
127•WhatsTheBigIdea•23h ago•47 comments

Flood Fill vs. The Magic Circle

https://www.robinsloan.com/winter-garden/magic-circle/
12•tobr•3d ago•1 comments

Instagram's URL Blackhole

https://medium.com/@shredlife/instagrams-url-blackhole-c1733e081664
27•tkp-415•1d ago•5 comments

Discord: A case study in performance optimization

https://newsletter.fullstack.zip/p/discord-a-case-study-in-performance
33•tylerdane•22h ago•14 comments

Amsterdam Compiler Kit

https://github.com/davidgiven/ack
81•andsoitis•6h ago•16 comments

5,300-year-old 'bow drill' rewrites story of ancient Egyptian tools

https://www.ncl.ac.uk/press/articles/latest/2026/02/ancientegyptiandrillbit/
26•geox•3d ago•0 comments

Colored Petri Nets, LLMs, and distributed applications

https://blog.sao.dev/cpns-llms-distributed-apps/
14•stuartaxelowen•2h ago•1 comments

Launching Interop 2026

https://hacks.mozilla.org/2026/02/launching-interop-2026/
27•linolevan•1d ago•2 comments

A header-only C vector database library

https://github.com/abdimoallim/vdb
48•abdimoalim•5h ago•13 comments

The consequences of task switching in supervisory programming

https://martinfowler.com/fragments/2026-02-13.html
19•bigwheels•1d ago•0 comments

Ask HN: How to get started with robotics as a hobbyist?

133•StefanBatory•6d ago•58 comments

Show HN: Sameshi – a ~1200 Elo chess engine that fits within 2KB

https://github.com/datavorous/sameshi
172•datavorous_•9h ago•50 comments

Descent, ported to the web

https://mrdoob.github.io/three-descent/
115•memalign•3h ago•21 comments

Unicorn Jelly

https://unicornjelly.com/
21•avaer•10h ago•6 comments

A review of M Disc archival capability with long term testing results (2016)

http://www.microscopy-uk.org.uk/mag/artsep16/mol-mdisc-review.html
53•1970-01-01•7h ago•55 comments

How often do full-body MRIs find cancer?

https://www.usatoday.com/story/life/health-wellness/2026/02/11/full-body-mris-cancer-aneurysm/883...
56•brandonb•1d ago•43 comments

An AI agent published a hit piece on me – more things have happened

https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me-part-2/
591•scottshambaugh•22h ago•521 comments

Windows NT/OS2 Design Workbook

https://computernewb.com/~lily/files/Documents/NTDesignWorkbook/
59•markus_zhang•3d ago•22 comments

OpenAI should build Slack

https://www.latent.space/p/ainews-why-openai-should-build-slack
78•swyx•15h ago•89 comments

Vim 9.2

https://www.vim.org/vim-9.2-released.php
309•tapanjk•7h ago•132 comments

A method and calculator for building foamcore drawer organisers

https://capnfabs.net/posts/foamcore-would-be-a-sick-name-for-a-music-genre/
57•evakhoury•5d ago•13 comments

Fun with Algebraic Effects – From Toy Examples to Hardcaml Simulations

https://blog.janestreet.com/fun-with-algebraic-effects-hardcaml/
47•weinzierl•4d ago•1 comments

Zig – io_uring and Grand Central Dispatch std.Io implementations landed

https://ziglang.org/devlog/2026/#2026-02-13
337•Retro_Dev•15h ago•245 comments

Show HN: Arcmark – macOS bookmark manager that attaches to browser as sidebar

https://github.com/Geek-1001/arcmark
58•ahmed_sulajman•6h ago•13 comments