frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

DeepSeek-v3.2: Pushing the frontier of open large language models [pdf]

https://huggingface.co/deepseek-ai/DeepSeek-V3.2/resolve/main/assets/paper.pdf
81•pretext•3h ago
https://huggingface.co/deepseek-ai/DeepSeek-V3.2

https://api-docs.deepseek.com/news/news251201

Comments

BoorishBears•5h ago
3.2-Exp came out in September: this is 3.2, along with a special checkpoint (DeepSeek-V3.2-Speciale) for deep reasoning that they're claiming surpasses GPT-5 and matches Gemini 3.0

https://x.com/deepseek_ai/status/1995452641430651132

zparky•4h ago
Benchmarks are super impressive, as usual. Interesting to note in table 3 of the paper (p. 15), DS-Speciale is 1st or 2nd in accuracy in all tests, but has much higher token output (50% more, or 3.5x vs gemini 3 in the codeforces test!).
futureshock•1h ago
The higher token output is not by accident. Certain kinds of logical reasoning problems are solved by longer thinking output. Thinking chain output is usually kept to a reasonable length to limit latency and cost, but if pure benchmark performance is the goal you can crank that up to the max until the point of diminishing returns. DeepSeek being 30x cheaper than Gemini means there’s little downside to max out the thinking time. It’s been shown that you can further scale this by running many solution attempts in parallel with max thinking then using a model to choose a final answer, so increasing reasoning performance by increasing inference compute has a pretty high ceiling.
jodleif•3h ago
I genuinely do not understand the evaluations of the US AI industry. The chinese models are so close and far cheaper
newyankee•43m ago
Yet tbh if the US industry had not moved ahead and created the race with FOMO it would not had been easier for Chinese strategy to work either.

The nature of the race may change as yet though, and I am unsure if the devil is in the details, as in very specific edge cases that will work only with frontier models ?

jazzyjackson•41m ago
Valuation is not based on what they have done but what they might do, I agree tho it's investment made with very little insight into Chinese research. I guess it's counting on deepseek being banned and all computers in America refusing to run open software by the year 2030 /snark
bilbo0s•35m ago
>I guess it's counting on deepseek being banned

And the people making the bets are in a position to make sure the banning happens. The US government system being what it is.

Not that our leaders need any incentive to ban Chinese tech in this space. Just pointing out that it's not necessarily a "bet".

"Bet" imply you don't know the outcome and you have no influence over the outcome. Even "investment" implies you don't know the outcome. I'm not sure that's the case with these people?

jodleif•21m ago
> Valuation is not based on what they have done but what they might do

Exactly what I’m thinking. Chinese models catching rapidly. Soon to be on-par with the big dogs.

jasonsb•39m ago
It's all about the hardware and infrastructure. If you check OpenRouter, no provider offers a SOTA chinese model matching the speed of Claude, GPT or Gemini. The chinese models may benchmark close on paper, but real-world deployment is different. So you either buy your own hardware in order to run a chinese model at 150-200tps or give up an use one of the Big 3.

The US labs aren't just selling models, they're selling globally distributed, low-latency infrastructure at massive scale. That's what justifies the valuation gap.

csomar•33m ago
According to OpenRouter, z.ai is 50% faster than Anthropic; which matches my experience. z.ai does have frequent downtimes but so does Claude.
jodleif•17m ago
Assuming your hardware premise is right (and lets be honest, nobody really wants to send their data to chinese providers) You can use a provider like Cerebras, Groq?
isamuel•25m ago
There is a great deal of orientalism --- it is genuinely unthinkable to a lot of American tech dullards that the Chinese could be better at anything requiring what they think of as "intelligence." Aren't they Communist? Backward? Don't they eat weird stuff at wet markets?

It reminds me, in an encouraging way, of the way that German military planners regarded the Soviet Union in the lead-up to Operation Barbarossa. The Slavs are an obviously inferior race; their Bolshevism dooms them; we have the will to power; we will succeed. Even now, when you ask questions like what you ask of that era, the answers you get are genuinely not better than "yes, this should have been obvious at the time if you were not completely blinded by ethnic and especially ideological prejudice."

newyankee•23m ago
but didn't Chinese already surpass the rest of the world in Solar, batteries, EVs among other things ?
cyberlimerence•15m ago
They did, but the goalposts keep moving, so to speak. We're approximately here : advanced semiconductors, artificial intelligence, reusable rockets, quantum computing, etc. Chinese will never catch up. /s
mosselman•8m ago
Back when deepseek came out and people were tripping over themselves shouting it was so much better than what was out there, it just wasn’t good.

It might be this model is super good, I haven’t tried it, but to say the Chinese models are better is just not true.

What I really love though is that I can run them (open models) on my own machine. The other day I categorised images locally using Qwen, what a time to be alive.

Further even than local hardware, open models make it possible to run on providers of choice, such as European ones. Which is great!

So I love everything about the competitive nature of this.

espadrine•18m ago
Two aspects to consider:

1. Chinese models typically focus on text. US and EU models also bear the cross of handling image, often voice and video. Supporting all those is additional training costs not spent on further reasoning, tying one hand in your back to be more generally useful.

2. The gap seems small, because so many benchmarks get saturated so fast. But towards the top, every 1% increase in benchmarks is significantly better.

On the second point, I worked on a leaderboard that both normalizes scores, and predicts unknown scores to help improve comparisons between models on various criteria: https://metabench.organisons.com/

You can notice that, while Chinese models are quite good, the gap to the top is still significant.

However, the US models are typically much more expensive for inference, and Chinese models do have a niche on the Pareto frontier on cheaper but serviceable models (even though US models also eat up the frontier there).

jodleif•16m ago
1. Have you seen the Qwen offerings? They have great multi-modality, some even SOTA.
TIPSIO•39m ago
It's awesome that stuff like this is open source, but even if you have a basement rig with 4 NVIDIA GeForce RTX 5090 graphic cards ($15-20k machine), can it even run with any reasonable context window that isn't like a crawling 10/tps?

Frontier models are far exceeding even the most hardcore consumer hobbyist requirements. This is even further

bigyabai•37m ago
People with basement rigs generally aren't the target audience for these gigantic models. You'd get much better results out of an MoE model like Qwen3's A3B/A22B weights, if you're running a homelab setup.
Spivak•33m ago
Yeah I think the advantage of OSS models is that you can get your pick of providers and aren't locked into just Anthropic or just OpenAI.
red2awn•34m ago
Worth noting this is not only good on benchmarks, but significantly more efficient at inference https://x.com/_thomasip/status/1995489087386771851
minimaxir•12m ago
Testing the model in OpenRouter, it's pretty speedy compared to Claude/GPT/Gemini.
zug_zug•32m ago
Well props to them for continuing to improve, winning on cost-effectiveness, and continuing to publicly share their improvements. Hard not to root for them as a force to prevent an AI corporate monopoly/duopoly.
htrp•32m ago
what is the ballpark vram / gpu requirement to run this ?
rhdunn•19m ago
For just the model itself: 4 x params at F32, 2 x params at F16/BF16, or 1 x params at F8, e.g. 685GB at F8. It will be smaller for quantizations, but I'm not sure how to estimate those.

For a Mixture of Experts (MoE) model you only need to have the memory size of a given expert. There will be some swapping out as it figures out which expert to use, or to change expert, but once that expert is loaded it won't be swapping memory to perform the calculations.

You'll also need space for the context window; I'm not sure how to calculate that either.

High-income job losses are cooling housing demand

https://jbrec.com/insights/job-growth-housing-demand-metro-analysis-2026/
104•gmays•48m ago•72 comments

DeepSeek-v3.2: Pushing the frontier of open large language models [pdf]

https://huggingface.co/deepseek-ai/DeepSeek-V3.2/resolve/main/assets/paper.pdf
86•pretext•3h ago•27 comments

Ask HN: Who is hiring? (December 2025)

123•whoishiring•3h ago•147 comments

India orders smartphone makers to preload state-owned cyber safety app

https://www.reuters.com/sustainability/boards-policy-regulation/india-orders-mobile-phones-preloa...
78•jmsflknr•12h ago•42 comments

Response to Ruby Is Not a Serious Programming Language

https://robbyonrails.com/articles/2025/12/01/why-so-serious/
44•robbyrussell•53m ago•24 comments

Why xor eax, eax?

https://xania.org/202512/01-xor-eax-eax
358•hasheddan•6h ago•135 comments

Cartographers Have Been Hiding Covert Illustrations Inside of Switzerland's Maps

https://eyeondesign.aiga.org/for-decades-cartographers-have-been-hiding-covert-illustrations-insi...
173•mhb•5h ago•40 comments

Isn't WSL2 just a VM?

https://ssg.dev/isnt-wsl2-just-a-vm/
83•sedatk•6d ago•42 comments

Ask HN: Who wants to be hired? (December 2025)

39•whoishiring•3h ago•94 comments

Better Auth (YC X25) Is Hiring

https://www.ycombinator.com/companies/better-auth/jobs/eKk5nLt-developer-relation-engineer
1•bekacru•2h ago

Intel could return to Apple computers in 2027

https://www.theverge.com/news/832366/intel-apple-m-chip-low-end-processor
9•DamnInteresting•23m ago•1 comments

ImAnim: Modern animation capabilities to ImGui applications

https://github.com/soufianekhiat/ImAnim
47•klaussilveira•2h ago•11 comments

Search tool that only returns content created before ChatGPT's public release

https://tegabrain.com/Slop-Evader
755•dmitrygr•15h ago•302 comments

Google unkills JPEG XL?

https://tonisagrista.com/blog/2025/google-unkills-jpegxl/
152•speckx•3h ago•128 comments

A vector graphics workstation from the 70s

https://justanotherelectronicsblog.com/?p=1429
94•ibobev•5h ago•16 comments

Self-hosting a Matrix server for 5 years

https://yaky.dev/2025-11-30-self-hosting-matrix/
197•the-anarchist•7h ago•86 comments

Better Than JSON

https://aloisdeniel.com/blog/better-than-json
4•barremian•10m ago•0 comments

Historic Engineering Wonders: Photos That Reveal How They Pulled It Off

https://rarehistoricalphotos.com/engineering-methods-from-the-past/
87•dxs•6d ago•16 comments

Ghostty compiled to WASM with xterm.js API compatibility

https://github.com/coder/ghostty-web
7•kylecarbs•52m ago•0 comments

Durin is a library for reading and writing the Dwarf debugging format

https://github.com/tmcgilchrist/durin
3•mooreds•34m ago•0 comments

Games using anti-cheats and their compatibility with GNU/Linux or Wine/Proton

https://areweanticheatyet.com/
216•doener•12h ago•299 comments

I made a quieter air purifier

https://chillphysicsenjoyer.substack.com/p/i-made-a-quieter-air-purifier
86•crescit_eundo•6d ago•43 comments

Langjam Gamejam: Build a programming language then make a game with it

https://langjamgamejam.com/
98•birdculture•1d ago•44 comments

WordPress plugin quirk resulted in UK Gov OBR Budget leak [pdf]

https://obr.uk/docs/dlm_uploads/01122025-Investigation-into-November-2025-EFO-publication-error.pdf
96•robtaylor•4h ago•89 comments

Spleen Monospaced Bitmap Fonts

https://github.com/fcambus/spleen
12•keyle•5d ago•5 comments

It’s been a very hard year

https://bell.bz/its-been-a-very-hard-year/
303•surprisetalk•13h ago•402 comments

The Penicillin Myth

https://www.asimov.press/p/penicillin-myth
92•surprisetalk•4h ago•48 comments

Trifold is a tool to quickly and cheaply host static websites using a CDN

https://www.jpt.sh/projects/trifold/
85•birdculture•1w ago•30 comments

Detection of triboelectric discharges during dust events on Mars

https://gizmodo.com/weve-detected-lightning-on-mars-for-the-first-time-2000691996
87•domofutu•4d ago•46 comments

Advent of Code 2025

https://adventofcode.com/2025/about
1139•vismit2000•1d ago•364 comments