DeepSeek open-sources inference optimizations with 60–85% faster generation [pdf]

https://github.com/deepseek-ai/DeepSpec/blob/main/DSpark_paper.pdf

192•aurenvale•1h ago

Comments

Havoc•46m ago

Nice.

Guessing the timing isn't accidental. Demonstrated openness vs harsh regulation

ricardobeat•44m ago

Presumably this has been in production for a while, and is one of the reasons they were able to dramatically lower prices a month ago?

_0ffh•17m ago

Lookahead Sparse Attention should be playing a big role as well, as it dramatically slashes memory consumption.

Jackobrien•44m ago

I see a world soon where there’s an extremely wide variety of small models for speculative decoding, unique to use cases, companies, and even individuals.

nicce•33m ago

Hopefully that is the case and hardware does not get impossible to get.

pydry•21m ago

yes, heavily constrained by sophisticated guardrails.

this is definitely where things are going. the enormous "eat the world" models have extreme diminishing returns by comparison.

preetham_rangu•38m ago

do they use their OCR, or someone else?

piterrro•37m ago

I’ve been using DeepSeek v4 pro for a month now in Kilo Code and its great. Fast, reliable, large context window and cheap as… Did 1,5B tokens this month and cost me 40usd (majority cached, but still).

spiderfarmer•30m ago

Is there a way to see how many tokes one does with claude code (pro)?

cptchaos•26m ago

https://ccusage.com/

bpavuk•10m ago

the casino has no clocks, as one HN user put it some time ago.

I second ccusage, it's nice

rvz•28m ago

This is just one of many papers DeepSeek have released to be able to serve models at extremely cheap prices, unlike the others taking on >$100B+ of debt in building data centers for the same thing.

> As with V4-Flash, we treat this point as an indication that DSpark sustains useful throughput under an interactivity target that the baseline cannot efficiently support. At matched system capacities, DSpark delivers 57% to 78% faster per-user generation.

Reminds me of the flawed solution in scaling servers in 2017 that use memory-intensive technologies by adding even more servers to solve the problem. (It just increases costs.)

Rather than doing that, think about which critical parts of your app can be written in a more performant technology.

Fast forward to 2026, now you can see who is just throwing more money at the problem to create even more problems where as DeepSeek is giving us optimized solutions.

I know exactly who I would pay attention to, and it is absolutely not Anthropic.

2838383838•28m ago

Must be wonderful to be on the board of OpenAi et al & their PE investors whilst China keeps blowing up these mines under their feet lmao. Luckily Korean pension funds will buy all the trash as usual but goddamn you gotta start moving quick or you are gonna need some serious AGI to show you how to offload those bonds

ForHackernews•17m ago

"We will build the machine-god and pray for it to pay for itself."

FridgeSeal•4m ago

Every day, the rate of “could post a picture of 40k tech priests and have it taken unironically” goes up, and it’s starting to get concerning.

kamranjon•24m ago

DeepSeek continues to not only push the boundaries but also publish these incredible papers explaining how they achieved their gains - something the American labs no longer do unfortunately. Chinese labs are doing the most interesting work in AI right now.

herodoturtle•21m ago

Publishing by necessity I wonder? American labs on the cutting edge pioneering the way forward, so Deepseek open sourcing what they’ve got is to help even the playing field.

Hopefully the experts here can offer insight. The above is just my hunch and I’m not a specialist in this field.

jonplackett•19m ago

Wouldn’t that just help the American labs anyway though? Or do they assume they’ve actually already figured this stuff out and kept it secret?

_0ffh•9m ago

I'm afraid I'm even balking at the word "pioneering" in context with US frontier labs. They are probably doing a few new things, right, but they are not blazing any trails for others to follow along, the Chinese are.

tomalaci•18m ago

Probably because American AI companies are on the hook for quite a lot of investment money. I think they are trying to find the magical moat to justify their valuation.

Revealing optimizations similar to these would pretty much reduce their competitive position.

pokot0•3m ago

I am wondering if this is why they can offer their pro model at ~1/4th of the price compared to the other providers offering the same model, and if other providers will be able to do the same in a short timeframe.

Ornith-1.0: A family of open-source LLMs specialized for agentic coding

U.S. government restricts access to OpenAI's new AI model

Show HN: PreFlight – A local AST background daemon to catch AI code drift

Apple's Raising Prices. Here's What That Means for the Rest of Tech

Students are doing worse than you think

Show HN: Codex can track external events with respect to internal data

Designing a Personal Pebble Watchface

UN Charter – United Nations

Fintech Engineering Handbook

Show HN: Mcpify – Turn any REST API into an MCP server in one command

Properties of AI

I'd Build a 864K Motivation Page a $225K/Year App [video]

AI data centers are supercharging a new battery market

Rewriting the World in Rust

Ask HN: How do we measure software in LLM era?

Arbitraging the global trademark treaty system

Texas makes Bible passages required reading for public school students

Show HN: Open-Source API for Asynchronous Tasks Using Agents

Show HN: Play puzzle games in a feed like TikTok

Pixar's Believable People

One More Thing

Getting LLMs Drunk to Find Remote Linux Kernel OOB Writes (and More)

Ask HN: What GUI/desktop app do you use to keep track of different AI sessions?

Padel disrupted the genteel world of lawn tennis

NVMe-CLI v3.0B.1 ships per-cycle sanitize verification

Show HN: The TypeScript Semantic Layer for ClickHouse

Bulk image and file compressor & converter.

NodeQuest – a browser puzzle game that teaches n8n workflow automation

End-to-end model that listens, sees, thinks and responds on video in real time

A debugging story: Learning debugging principles from a production outage