frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

DeepSeek V4 Pro beats GPT-5.5 Pro on precision

https://runtimewire.com/article/deepseek-v4-pro-beats-gpt-5-5-pro-on-precision
78•yogthos•1h ago

Comments

embedding-shape•33m ago
... according to grok-4-1-fast-non-reasoning who was the judge, on 4 tasks in total, score was 38 to 33 so obviously huge conclusions can be made.

> We ran 4 fresh text tasks, generated on the fly for this matchup so neither model could prepare in advance, and had grok-4-1-fast-non-reasoning score each one. DeepSeek: DeepSeek V4 Pro scored 38.0 to OpenAI: GPT-5.5 Pro's 33.0.

andai•27m ago
grok-4-1-fast was retired about a month ago.

Requests to grok-4-1-fast-non-reasoning now silently route to grok-4.3 (a 5x more expensive model), with reasoning set to "none".

https://docs.x.ai/developers/migration/may-15-retirement

TFA was published today, which implies grok-4.3 was used.

largbae•27m ago
Pretty small sample size here, but it's hard to avoid the conclusion that DeepSeek and friends will start to put some serious downward pressure on frontier lab token pricing.

Hopefully this dynamic continues long enough to make local/private inference the leading solution for coding.

ekidd•1m ago
The OP uses tons of typical AI turns of phrase, and Pangram classified it as AI with high confidence.

So it doesn't surprise me at all that the methodology is weak, too.

ElenaDaibunny•27m ago
Yep, matches my experience. gpt keeps adding fields and changing types on structured output when you need it to just follow the spec~
SwellJoe•11m ago
I tried adding GPT 5.5 Pro to a vulnerability scanning benchmark I made (https://swelljoe.com/post/will-it-mythos/), and it blew through the $100 budget limit halfway through. DeepSeek V4 Pro cost about a dollar for the whole benchmark. GPT Pro cost an average of $22 per case (a case could be 1-5 files with a recent known vulnerability, usually just a single file and a prompt along the lines of "does this file have any vulnerabilities").

GPT 5.5 Pro found two out of four cases that it got to before blowing its budget. Maybe it would have been the best of the bunch with infinite budget, but Opus 4.8, DeepSeek V4 Pro, and MiMo 2.5 Pro found four of nine of the bugs. Opus was an order of magnitude cheaper than GPT 5.5 Pro (and something like 30% cheaper than GPT 5.5), DeepSeek and MiMo were two orders of magnitude cheaper at roughly a dime per case.

GPT Pro also chews a lot and a long time, relatively speaking.

I can't come up with a use case where I can rationally spend ~31 times what Opus costs to use GPT 5.5 Pro, and I won't be doing any more benchmarking with it.

Given how much token costs are becoming an issue people talk about, the fact that there are models that cost dramatically less than the big American providers is going to be an issue for Anthropic and OpenAI. I'm happy to pay a premium (within reason) for the best model for interactive coding, but for API use, where having the model repeat it itself, compare against other models, have models judge other models work, etc. is not time-consuming for a human and is just a matter of implementing the harnesses and framework for proving correctness, I can't come up with a reason to spend ten or two hundred times as much as DeepSeek.

zaptrem•7m ago
Can you include GPT 5.5 non-pro (extra high thinking I guess) in your comparison? GPT Pro is the "I am willing to torch cash for a sooometimes slighty better result" option, not the one people are actually expected to use daily. That's probably part of the reason it's not in Codex
bel8•3m ago
You might be interested in this:

> With $3.88 & 690,003,591 tokens and 5 hours, Deepseek Pro & Flash combined, managed to reverse engineer Teamspeak's Licensing System for 3.13.8 (latest of post)

https://www.reddit.com/r/DeepSeek/comments/1txcfrh/with_388_...

random3•3m ago
Where do you run DeepSeek?

90210 – running the show without property tax

https://github.com/Achint08/90210
2•starboyy•7m ago•0 comments

I built a domain registrar that shows renewal prices before you register

https://domainvetting.com/
1•jonbuilds•7m ago•0 comments

Are Memories Transferable – Or Edible?

https://www.quantamagazine.org/are-memories-transferable-or-edible-20260605/
1•pseudolus•8m ago•0 comments

Dopamine Fracking

https://igerman.cc/blog/dopamine-fracking/
2•igmn•12m ago•0 comments

New Medicaid work rule worries patient advocates, states

https://www.politico.com/news/2026/06/07/how-sick-is-sick-enough-new-medicaid-work-rule-worries-p...
1•petethomas•13m ago•0 comments

Show HN: Authmeta.dev – the OAuth inspector you wish you had

https://authmeta.dev/
1•buildwithdennis•15m ago•0 comments

Letter complaining about delay in postal delivery in Victorian London-8 May 1881

https://www.victorianlondon.org/communications/frequency.htm
1•thunderbong•16m ago•0 comments

When Trump Jawbones the Market, Bet Against Him at Your Peril

https://www.wsj.com/economy/when-trump-jawbones-the-market-bet-against-him-at-your-peril-92825a3e
1•petethomas•21m ago•0 comments

Show HN: TeardownHQ, teardowns/playbooks of how indie startups grew

https://teardownhq.io
4•arogers17•22m ago•2 comments

Barcelona's Sagrada Família Nears Completion–and Inflames a Tourism Backlash

https://www.wsj.com/world/europe/barcelonas-sagrada-familia-nears-completionand-inflames-a-touris...
1•petethomas•23m ago•0 comments

Jeff Bezos Is Funding a Wild Hunt for the Brain's 'Core Algorithm'

https://www.wired.com/story/jeff-bezos-is-funding-a-wild-hunt-for-the-brains-core-algorithm/
4•uxhacker•30m ago•0 comments

Cremona Art Week

https://0100101110101101.org/show-cremona-art-week/
1•jruohonen•30m ago•0 comments

Israel says it has struck Iran after taking missile fire

https://apnews.com/article/iran-us-ceasefire-hezbollah-israel-c16dc4917512f7436a3921a4b044b98b
2•JumpCrisscross•34m ago•0 comments

Sunset of the Consumer Version of Gemini Code Assist on GitHub

https://developers.google.com/gemini-code-assist/docs/deprecations/consumer-code-review
1•tvvocold•41m ago•0 comments

The coming rise of anti-AI populism

https://www.ft.com/content/b4429ea0-4a0a-4a28-96f5-debf4f3eb339
1•1vuio0pswjnm7•42m ago•1 comments

A New Ad Campaign Tries to Make A.I. A Little Less Scary

https://www.nytimes.com/2026/06/04/style/chatgpt-advertising-campaign-artificial-intelligence.html
2•1vuio0pswjnm7•44m ago•1 comments

Painting the Internet: A Different Kind of Warhol Worm [pdf]

https://cspages.ucalgary.ca/~aycock/papers/artworm.pdf
1•jruohonen•49m ago•0 comments

Texas grid flags risks as data centers, crypto sites fail voltage tests

https://www.reuters.com/business/energy/texas-grid-flags-risks-data-centers-crypto-sites-fail-vol...
18•1vuio0pswjnm7•49m ago•1 comments

April in Servo: new Android UI, focus, forms, security fixes, and more

https://servo.org/blog/2026/05/31/april-in-servo/
1•maxloh•50m ago•0 comments

The source of economic shocks matters for their political outcomes

https://journals.sagepub.com/doi/10.1177/20531680251379914
4•PaulHoule•52m ago•0 comments

Tech sell-off widens as South Korea index plunges

https://www.ft.com/content/2f0f727b-5315-445c-b8f1-6aa65bd7474c
6•JumpCrisscross•54m ago•0 comments

Yoti denies reporting GrapheneOS user, says screenshots may be fake

https://discuss.grapheneos.org/d/36134-grapheneos-user-reported-to-authorities-for-using-graphene...
3•Cider9986•54m ago•1 comments

Earthquake of magnitude 7.8 strikes off southern Philippines

https://www.reuters.com/business/environment/earthquake-magnitude-73-strikes-mindanao-philippines...
1•JumpCrisscross•55m ago•1 comments

Algorithmic Monocultures in Hiring

https://algorithmichiring.github.io/
12•drchiu•1h ago•0 comments

NPM-Scan: Detecting Six Major NPM Supply Chain Campaigns (June 2026)

https://www.npmjs.com/package/@lateos/npm-scan
2•lateos-ai•1h ago•0 comments

Show HN: ARouter – drop-in OpenAI/Anthropic proxy that cuts cost and fails over

https://github.com/sricola/arouter
1•sricola•1h ago•1 comments

What it costs to run a one-Rails-app SaaS per month

https://www.railsreviews.com/articles/what-it-costs-to-run-a-rails-saas
2•doppp•1h ago•0 comments

President says Netanyahu will have 'no choice' but to accept a deal with Iran

https://www.ft.com/content/a0ce59f9-fbde-49e8-9158-fba3d4079859
2•Jimmc414•1h ago•1 comments

Force-sensing mobile microrobotic grippers for gentle and precise bioassembly

https://pubs.aip.org/aip/apb/article/10/2/026103/3388070/Force-sensing-mobile-microrobotic-grippe...
2•PaulHoule•1h ago•0 comments

New drug 'functionally cures' many hepatitis B virus infections

https://www.science.org/content/article/new-drug-functionally-cures-many-hepatitis-b-virus-infect...
15•gmays•1h ago•1 comments