GPU-rich labs have won: What's left for the rest of us is distillation

https://inference.net/blog/what-s-left-is-distillation

48•npmipg•3h ago

Comments

madars•2h ago

The blog kept redirecting to the home page after a second, so here's an archive: https://archive.is/SE78v

ilaksh•2h ago

There is huge pressure to prove and scale radical alternative paradigms like memory-centric compute such as memristors, or SNNs, etc. That's why I am surprised we don't hear a lot about very large speculative investments in these directions to dramatically multiply AI compute efficiency.

But one has to imagine that seeing so many huge datacenters go up and not being able to do training runs etc. is motivating a lot of researchers to try things that are really different. At least I hope so.

It seems pretty short sighted that the funding numbers for memristor startups (for example) are so low so far.

Anyway, assuming that within the next several years more radically different AI hardware and AI architecture paradigms pay off in efficiency gains, the current situation will change. Fully human level AI will be commoditized, and training will be well within the reach of small companies.

I think we should anticipate this given the strong level of need to increase efficiency dramatically, the number of existing research programs, the amount of investment in AI overall, and the history of computation that shows numerous dramatic paradigm shifts.

So anyway "the rest of us" I think should be banding together and making much larger bets on proving and scaling radical new AI hardware paradigms.

sidewndr46•2h ago

I think a pretty good chunk of HP's history explains why memristors don't get used in a commercial capacity.

ofrzeta•2h ago

You remember The Machine? I had a vague memory but I had to look it up.

michelpp•2h ago

Not sure why this is being downvoted, it's a thoughtful comment. I too see this crisis as an opportunity to push boundaries past current architectures. Sparse models for example show a lot of promise and more closely track real biological systems. The human brain has an estimated graph density of 0.0001 to 0.001. Advances in sparse computing libraries and new hardware architectures could be key to achieving this kind of efficiency.

lazide•2h ago

Memristors have been tried for literally decades.

If the posters other guesses pay out the same rate, this will likely play out never.

ilaksh•1h ago

Other technologies tried for decades before becoming huge: Neural-network AI; Electric cars; mRNA vaccines; Solar photovoltaics; LED lighting

lazide•1h ago

Ho boy, should we start listing the 10x number of things that went in the wastebasket too?

ToValueFunfetti•25m ago

If I only have to try 11 things for one of them to be LED lights or electric cars, I'd better get trying. Sure, I might have to empty a wastebasket at some point, but I'll just pay someone for that.

kelipso•1h ago

There was a bit of noise regarding spiking neural networks a few years ago but now I am not seeing it so often anymore.

thekoma•2h ago

Even in that scenario, what would stop the likes of OpenAI to instead throw 50M+ a day to the new way of doing things and still outcompete smaller fry?

hnuser123456•1h ago

>memory-centric compute

This already exists: https://www.cerebras.ai/chip

They claim 44 GB of SRAM at 21 PB/s.

cma•43m ago

They use separate memory servers, networked memory adjacent adjacent compute with small amounts of fast local memory.

Waferscale severely limits bandwidth once you go beyond SRAM, because with far less chip perimeter per unit area there is less place to hook up IO.

marcosdumay•1h ago

Memristors in particular just won't happen.

But memory-centric compute didn't happen because of Moore's law. (SNNs have the problem that we don't actually know how to use them.) Now that it's gone, it may have a chance, but it still takes a large amount of money thrown into the idea and the people with money are so risk-adverse that they create entire new risks for themselves.

Forward neural networks were very lucky that there existed a mainstream use for the kind of hardware it needed.

latchkey•2h ago

Not a fan of fear based marketing: "The whole world is too big and expensive for you to participate in, so use our service instead"

I'd rather approach these things from the PoV of: "We use distillation to solve your problems today"

The last sentence kind of says it all: "If you have 30k+/mo in model spend, we'd love to chat."

42lux•2h ago

We haven't seen a proper npu and we are in the launch of the first consumer grade unified architectures by Nvidia and AMD. The battle of homebrew AI hasn't even started yet.

stego-tech•1h ago

Hell, we haven’t even seen actual AI yet. This is all just brute-forcing likely patterns of tokens based on a corpus of existing material, not anything brand new or particularly novel. Who would’ve guessed that giving CompSci and Mathematics researchers billions of dollars in funding and millions of GPUs in parallel without the usual constraints of government research would produce the most expensive brute-force algorithms in human history?

I still believe this is going to be an embarrassing chapter of the history of AI when we actually do create it. “Humans - with the sort of hubris only a neoliberal post-war boom period could produce - honestly thought their first serious development in computing (silicon-based mircoprocessors) would lead to Artificial General Intelligence and usher in a utopia of the masses. Instead they squandered their limited resources on a Fool’s Errand, ignoring more important crises that would have far greater impacts on their immediate prosperity in the naive belief they could create a Digital God from Silicon and Electricity alone.”

braooo•56m ago

Yeh. We're still barely beyond the first few pixels that make up the bottom tail of the S-curve for autonomous type AI everyone imagines

Energy models and other substrates are going to be key, and it has nothing to do with text at all as human intelligence existed before language. It's Newspeak to run a chat bot on what is obviously a computer and call it an intelligence like a human. 1984 like dystopia crap.

YetAnotherNick•1h ago

Deepseek main run costed $6M. qwen3-30b-a3b probably would cost few $100Ks, which is ranked 13th.

GPU cost of the final model training isn't the biggest chunk of the cost and you can probably replicate results of models like Llama 3 very cheaply. It's the cost of experiments, researchers, data collection which brings overall cost 1 or 2 order of magnitude higher.

ilaksh•1h ago

What's your source for any of that? I think the $6 million thing was identified as a lie they felt was necessary because of GPU export laws.

YetAnotherNick•19m ago

It wasn't a lie, it was a misrepresentation of the total cost. It's not hard to calculate the cost of the training though. It takes 6 * active parameters * tokens flops[1]. To get number of seconds you can divide by Flops/s * MFU, where MFU is around 45% for H100 for large enough models[2].

[1]: https://arxiv.org/abs/2001.08361

[2]: https://github.com/facebookresearch/lingua

muratsu•1h ago

If I'm understanding this correctly, we should see some great coding LLMs. Idk, could be as limited as a single stack eg laravel/nextjs ecosystem.

thomassmith65•1h ago

Perhaps one of these days a random compsci undergrad will come up a DeepSeek-calibre optimization.

Just imagine his or her 'ChatGPT with 10,000x fewer propagations' Reddit post appearing on a Monday...

...and $3 trillion of Nvidia stock going down the drain by Friday.

therealpygon•1h ago

One can only hope. Maybe then they’ll sell us GPUs with 2025 quantity memory instead of 2015.

ilaksh•1h ago

DeepSeek came up with several significant optimizations, not just one. And master's students do contribute to leading edge research all the time.

There have really been many significant innovations in hardware, model architecture, and software, allowing companies to keep up with soaring demand and expectations.

But that's always how it's been in high technology. You only really hear about the biggest shifts, but the optimizations are continuous.

thomassmith65•1h ago

True, but I chose the words 'ChatGPT' and 'optimization' for brevity. There are many more eyes on machine learning since ChatGPT came along. There could be simpler techniques yet to discover. What boggles the mind is the $4 trillion parked in Nvidia stock, and wasted if more efficient code lessens the need for expensive GPUs.

tudorw•1h ago

Tropical Distillation?

ripped_britches•16m ago

50m per day is insane! Any link supporting that?

How America Wins the Invisible War

Show HN: Network For Developers to give opinions on frameworks, software, etc.

How many tabs do you keep open at the same time?

Newsom says CA will hold special election to combat Trump, TX redistricting

Flow Sensitivity Without CFG: An Efficient Andersen-Style Pointer Analysis

How to safely escape JSON inside HTML SCRIPT elements

How to Navigate the Jungle of Online Job Postings

Do they even test this?

Update on Malicious Gems Removal

Show HN: A Python CEL implementation (written in Rust)

Height Differece Tool

Back End to AI Engineer: A Realistic Path

JWT or Not: Personally Insecure Reflections on Software (In)Security [video]

Co-Founder and CTO of FusionAuth Daniel DeGroff on DIY Cyber Guy [audio]

L. E. Modesitt, jr. interview (2024)

The Lean Startup: Zen, the Art of Failing Fast and Reclaiming Aesthetic Vision

Roleplay worlds with AI just like you were reading a book

Tsutomu Yamaguchi: The man who survived both atomic bombs

How to Form an Opinion

Show HN: Tiered storage and fast SQL for InfluxDB 1.x/2.x

Vector Types and Debug Performance

Map Shows States Where Property Tax Could Be Repealed

The US has a bullfrog problem

Bitcoin Demand Shift: Coinbase's 60-Day BTC Premium Streak Is at Risk

Open-source control plane for Docker MCP Gateways?

SpaceX Dragon Undocking from ISS

Article: A Case of Bromism Influenced by Use of Artificial Intelligence

How does Tor work? (2023)

Trump administration seeks $1B settlement from UCLA

Roland's Tadeo Kikumoto on 808, part by part: the ukiyo-e drum machine

How America Wins the Invisible War

Show HN: Network For Developers to give opinions on frameworks, software, etc.

How many tabs do you keep open at the same time?

Newsom says CA will hold special election to combat Trump, TX redistricting

Flow Sensitivity Without CFG: An Efficient Andersen-Style Pointer Analysis

How to safely escape JSON inside HTML SCRIPT elements

How to Navigate the Jungle of Online Job Postings

Do they even test this?

Update on Malicious Gems Removal

Show HN: A Python CEL implementation (written in Rust)

Height Differece Tool

Back End to AI Engineer: A Realistic Path

JWT or Not: Personally Insecure Reflections on Software (In)Security [video]

Co-Founder and CTO of FusionAuth Daniel DeGroff on DIY Cyber Guy [audio]

L. E. Modesitt, jr. interview (2024)

The Lean Startup: Zen, the Art of Failing Fast and Reclaiming Aesthetic Vision

Roleplay worlds with AI just like you were reading a book

Tsutomu Yamaguchi: The man who survived both atomic bombs

How to Form an Opinion

Show HN: Tiered storage and fast SQL for InfluxDB 1.x/2.x

Vector Types and Debug Performance

Map Shows States Where Property Tax Could Be Repealed

The US has a bullfrog problem

Bitcoin Demand Shift: Coinbase's 60-Day BTC Premium Streak Is at Risk

Open-source control plane for Docker MCP Gateways?

SpaceX Dragon Undocking from ISS

Article: A Case of Bromism Influenced by Use of Artificial Intelligence

How does Tor work? (2023)

Trump administration seeks $1B settlement from UCLA

Roland's Tadeo Kikumoto on 808, part by part: the ukiyo-e drum machine

GPU-rich labs have won: What's left for the rest of us is distillation

Comments