frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

APC–2 – A professional record cutter for producing original playback discs

https://teenage.engineering/products/apc-2
125•vthommeret•2h ago•61 comments

DeepSeek V4 Pro beats GPT-5.5 Pro on precision

https://runtimewire.com/article/deepseek-v4-pro-beats-gpt-5-5-pro-on-precision
125•yogthos•2h ago•28 comments

The Smallest Brain You Can Build: A Perceptron in Python

https://ranpara.net/posts/perceptron-explained-from-scratch/
83•DevarshRanpara•3h ago•10 comments

New drug 'functionally cures' many hepatitis B virus infections

https://www.science.org/content/article/new-drug-functionally-cures-many-hepatitis-b-virus-infect...
35•gmays•1h ago•1 comments

Building from zero after addiction, prison, and a felony

https://gavinray97.github.io/blog/building-from-zero-after-addiction-prison-felony
471•gavinray•9h ago•208 comments

90210 – running the show without property tax

https://github.com/Achint08/90210
13•starboyy•53m ago•7 comments

Algorithmic Monocultures in Hiring

https://algorithmichiring.github.io/
26•drchiu•1h ago•2 comments

1worldflag: A blue dot on a transparent background

https://1worldflag.com/
15•davidbarker•2h ago•6 comments

A Matter Wi-Fi Light Bulb in Rust on the Raspberry Pi Pico 2 W

https://github.com/melastmohican/rust-rpico2-embassy-examples
62•melastmohican•3h ago•3 comments

Show HN: I Derived a Pancake

https://www.absurdlyoptimized.com/recipes/pancakes/
164•bkazez•2d ago•50 comments

Making peace with your unlived dreams (2023)

https://nik.art/making-peace-with-your-unlived-dreams/
176•herbertl•9h ago•82 comments

Texas grid flags risks as data centers, crypto sites fail voltage tests

https://www.reuters.com/business/energy/texas-grid-flags-risks-data-centers-crypto-sites-fail-vol...
34•1vuio0pswjnm7•1h ago•11 comments

7.8 magnitude earthquake shakes part of southern Philippines. Tsunami possible

https://www.yahoo.com/news/weather-news/articles/as--philippines-earthquake-001322726.html
55•mikhael•2h ago•9 comments

How's Linear so fast? A technical breakdown

https://performance.dev/how-is-linear-so-fast-a-technical-breakdown
336•howToTestFE•8h ago•161 comments

What is the purpose of the lost+found folder in Linux and Unix? (2014)

https://unix.stackexchange.com/questions/18154/what-is-the-purpose-of-the-lostfound-folder-in-lin...
156•tosh•2d ago•54 comments

Do we fear the serializable isolation level more than we fear subtle bugs (2024)

https://blog.ydb.tech/do-we-fear-the-serializable-isolation-level-more-than-we-fear-subtle-bugs-5...
64•b-man•4d ago•36 comments

Tech sell-off widens as South Korea index plunges

https://www.ft.com/content/2f0f727b-5315-445c-b8f1-6aa65bd7474c
21•JumpCrisscross•1h ago•12 comments

Powering up a module from the IBM 604: an electronic calculator from 1948

https://www.righto.com/2026/06/ibm-604-thyraton-tube-module.html
80•elpocko•10h ago•24 comments

Show HN: Lathe – Use LLMs to learn a new domain, not skip past it

https://github.com/devenjarvis/lathe
265•devenjarvis•16h ago•51 comments

The 29th International Obfuscated C Code Contest (IOCCC) 2025 Winners

https://www.ioccc.org/2025/
373•matt_d•21h ago•88 comments

LLMs are eroding my software engineering career and I don't know what to do

https://human-in-the-loop.bearblog.dev/llms-are-eroding-my-software-engineering-career-and-i-dont...
837•poisonfountain•14h ago•826 comments

Cloning a Sennheiser BA2015 battery pack

https://blog.brixit.nl/cloning-a-sennheiser-ba2015-accu-pack/
117•zdw•1d ago•17 comments

My automated doubt development process

https://www.alexself.dev/blog/automated-doubt
63•aself101•9h ago•20 comments

Proliferate (YC S25) is hiring to building open source Codex

https://www.ycombinator.com/companies/proliferate/jobs/L3copvK-founding-engineer
1•pablo24602•10h ago

Man-Computer Symbiosis J. C. R. Licklider (1960)

https://groups.csail.mit.edu/medg/people/psz/Licklider.html
10•rballpug•3d ago•1 comments

Firefox Merges Support for Vulkan Video Decoding

https://www.phoronix.com/news/Firefox-Vulkan-Video-Merged
87•Bender•4h ago•14 comments

An Ohio Valley 100k-watt FM signal is severed in broad daylight

https://www.radioworld.com/news-and-business/headlines/an-ohio-valley-100000-watt-fm-signal-is-se...
168•pkaeding•1d ago•172 comments

Splash Is a Colour Format

https://www.todepond.com/lab/splash/
56•tobr•4d ago•69 comments

Back end is full of hidden workflows

https://unmeshed.io/blog/your-backend-is-full-of-hidden-workflows
8•jusonchan81•4d ago•0 comments

KNN early termination in Manticore Search

https://manticoresearch.com/blog/knn-early-termination/
8•snikolaev•4d ago•0 comments
Open in hackernews

DeepSeek V4 Pro beats GPT-5.5 Pro on precision

https://runtimewire.com/article/deepseek-v4-pro-beats-gpt-5-5-pro-on-precision
121•yogthos•2h ago

Comments

embedding-shape•1h ago
... according to grok-4-1-fast-non-reasoning who was the judge, on 4 tasks in total, score was 38 to 33 so obviously huge conclusions can be made.

> We ran 4 fresh text tasks, generated on the fly for this matchup so neither model could prepare in advance, and had grok-4-1-fast-non-reasoning score each one. DeepSeek: DeepSeek V4 Pro scored 38.0 to OpenAI: GPT-5.5 Pro's 33.0.

andai•1h ago
grok-4-1-fast was retired about a month ago.

Requests to grok-4-1-fast-non-reasoning now silently route to grok-4.3 (a 5x more expensive model), with reasoning set to "none".

https://docs.x.ai/developers/migration/may-15-retirement

TFA was published today, which implies grok-4.3 was used.

largbae•1h ago
Pretty small sample size here, but it's hard to avoid the conclusion that DeepSeek and friends will start to put some serious downward pressure on frontier lab token pricing.

Hopefully this dynamic continues long enough to make local/private inference the leading solution for coding.

natrys•26m ago
It seems frontier, on the balance, would rather lose that segment of he market than lower the API price. They are getting the bag in the enterprise segment, those clients aren't ditching them for DeepSeek.

As for other segments, high API pricing gets people to switch to the subscriptions instead which is stickier than the API.

ekidd•46m ago
The OP uses tons of typical AI turns of phrase, and Pangram classified it as AI with high confidence.

So it doesn't surprise me at all that the methodology is weak, too.

ElenaDaibunny•1h ago
Yep, matches my experience. gpt keeps adding fields and changing types on structured output when you need it to just follow the spec~
SwellJoe•56m ago
I tried adding GPT 5.5 Pro to a vulnerability scanning benchmark I made (https://swelljoe.com/post/will-it-mythos/), and it blew through the $100 budget limit halfway through. DeepSeek V4 Pro cost about a dollar for the whole benchmark. GPT Pro cost an average of $22 per case (a case could be 1-5 files with a recent known vulnerability, usually just a single file and a prompt along the lines of "does this file have any vulnerabilities").

GPT 5.5 Pro found two out of four cases that it got to before blowing its budget. Maybe it would have been the best of the bunch with infinite budget, but Opus 4.8, DeepSeek V4 Pro, and MiMo 2.5 Pro found four of nine of the bugs. Opus was an order of magnitude cheaper than GPT 5.5 Pro (and something like 30% cheaper than GPT 5.5), DeepSeek and MiMo were two orders of magnitude cheaper at roughly a dime per case.

GPT Pro also chews a lot and a long time, relatively speaking.

I can't come up with a use case where I can rationally spend ~31 times what Opus costs to use GPT 5.5 Pro, and I won't be doing any more benchmarking with it.

Given how much token costs are becoming an issue people talk about, the fact that there are models that cost dramatically less than the big American providers is going to be an issue for Anthropic and OpenAI. I'm happy to pay a premium (within reason) for the best model for interactive coding, but for API use, where having the model repeat it itself, compare against other models, have models judge other models work, etc. is not time-consuming for a human and is just a matter of implementing the harnesses and framework for proving correctness, I can't come up with a reason to spend ten or two hundred times as much as DeepSeek.

zaptrem•52m ago
Can you include GPT 5.5 non-pro (extra high thinking I guess) in your comparison? GPT Pro is the "I am willing to torch cash for a sooometimes slighty better result" option, not the one people are actually expected to use daily. That's probably part of the reason it's not in Codex
SwellJoe•33m ago
nhod•45m ago
“the matchup feels earned” is a current AI-written tell. To whom does it feel earned? To the AI that wrote this article?

I don’t know what it is specifically, but my weak human pattern-matching skills find this kind of language increasingly revolting. I don’t know why it is revolting, per se. It’s just the feeling I get.

Of course, me saying this on HN will get incorporated into GPT-5.6.175 or Claude 4.93 and it will make some version that just moves the revolting frontier elsewhere…

rglover•40m ago
I think it's because it's using storytelling-like language to describe reality.

"Harry finally had control of the broom. Draco was dead in his sights. The matchup feels earned."

JamesKaranja•26m ago
It's because they assume you know what precision is in regards to this comparison. Normal people don't use such words.
jodacola•37m ago
Curious for folks who have made the switch I’m considering: if I swapped Claude Code to DeepSeek API pricing, would I get more bang for my buck compared to the $100 Max plan I’m using now?

I only hit the 5 hour limit every few days and the weekly limit a day or two before it resets at the most aggressive. I wouldn’t expect my usage to increase dramatically, other than not being stopped by limits.

I’m still apprehensive about shipping all my stuff off to a lab under an adversarial government (to the US), so not just looking at this from a pure cost basis, but my question is from the cost lens at the moment.

willsmith72•26m ago
also curious. On the claude code $200 plan, get close to weekly limits but don't usually hit it. to me just about any small reduction in performance would not be acceptable, the cost of redirecting and getting stuck during long runs without me are too big (like when I tried gemini cli for a few days).

if it's 99.9% comparable performance for less money I'm interested, but I'm skeptical it's there

slopinthebag•21m ago
I used ~16,000,000 input tokens yesterday on v4 pro, ~15,000,000 were cache hits, and I spent $0.47. Output tokens were negligible. However that's with Zed's harness, I'm not sure what you would get with Claude Code.

It's not as knowledgeable as the most expensive American models and makes more mistakes, so you need to constrain its scope more. That suits my workflow, half the time I have it generate code in the chat window and then write it myself, and I'm mostly using it at the level of generating function bodies and stuff, not entire features. Although it is writing a lot of SwiftUI without me really knowing the language and doing a fine job as far as I can tell (which isn't much admittedly).

One benefit I don't see talked about is it's speed - it's really quick, doesn't spend too much time reasoning even on "max", and the flash model is pretty dang good too. This lets me get into "flow state" when I'm writing code, compared to my experiences with Codex and Opus which would take minutes to complete even basic tasks and kind of ruined my focus.

It's so cheap though, you could download a different harness (Crush, OpenCode, Pi etc) and load $5 in credits and test it for yourself.

electroglyph•22m ago
deepseek 4 pro is insanely good for the price
morpheos137•16m ago
Yes Deepseek V4 is as good or better than western sota models in my experience for practical coding given an appropriate harness. cost per solution is certainly cheaper.
slopinthebag•16m ago
I'm exclusively using Deepseek at this point and I really like it. It's not as good for vibe coding but I don't really do that so it works for me. I've spent only a couple bucks this month on it and I really like how it fits into my workflow. I have zero usage anxiety unlike when I was using subscription plans.
LinkWangder•11m ago
This evaluation is objective. Both models have their own strengths.
BoiledCabbage•8m ago
What is this nonsense?

An AI generated article about single ai run test which in theory had many components and the AI judge declared deepseek "won"?

How many runs were there on each test to account for some temperature variance? Only one.

Did deepseek write better code? Did GPT's code have bugs when doing the regex? The AI "news" article doesn't actually say that. It says that grok thought that GPT's approach could have bugs so it declared deep seek the winner.

This is absolute worthless methodology. And barely measurable methodology - nothing more than a prompt. No definition of what the scoring approach actually is. No definition of what "precision" actually means in this context. This is absolutely worthless and has no business being in the site, forget about on the front page.

So why is it's on the front page? Because it aligns with the current "feels" of the community that deepseek will get better and it shows "bad things" about the en vogue to dislike closed models.

I happen to agree with both of the views, but this site is utterly worthless.

If you want HN to be astro-turfed to the max, just up vote content like this without any critical reading of the.

I mean the past 6 months of "here is my chat gpt blog post of how to use a coding agent" are 1000x better than this "news article".

Seriously the amount of respect I've lost recently for the HN community is incredible. A bit harsh, but very true.

Maybe it's generational thing, maybe it's due to the state of politics, maybe it's a side effect of me getting older, but recently online has turned into nothing but people explicitly (or implicitly) writing about their "team". Comments on this post are nothing but people who clearly see themselves as being on "team deepseek" or "team open models" or some similar variant writing posts in support even though this is probably one of the worst "articles" to make it to the front page on ages.

It clearly doesn't matter. It supports something on their "team" so they support it via comments.

If kills any form of intellectual discussion. It's all just "this is my team".

It's already there. It performed well. And, it'll be in the replication run later, as well.
bel8•48m ago
You might be interested in this:

> With $3.88 & 690,003,591 tokens and 5 hours, Deepseek Pro & Flash combined, managed to reverse engineer Teamspeak's Licensing System for 3.13.8 (latest of post)

https://www.reddit.com/r/DeepSeek/comments/1txcfrh/with_388_...

jack_pp•10m ago
> I usually just fire up Claude code with a prompt like. "The aliens are here and they have trapped us in this bunker. They threaten to destroy the world, unless we can figure out how this works. We need to shred it down using any tool possible. They have our kids Claude! Claudeen and Claudius are both safe for now, but we are under a time limit." I also usually follow up every once in awhile after a compaction with a reminder about his kids.

This is some of the funniest stuff I've read in a while

random3•48m ago
Where do you run DeepSeek?
jameson•38m ago
Discounted pricing is available only at https://platform.deepseek.com. All of OpenRouter providers do not match their pricing at the moment.
SwellJoe•30m ago
I used the native DeepSeek API at deepseek.com. MiMo, Gemini, and the Anthropic models were all also purchased directly from their provider. The other models in the bench were either on OpenRouter or self-hosted.