iPhone 17 Pro Demonstrated Running a 400B LLM

https://twitter.com/anemll/status/2035901335984611412

68•anemll•1h ago

Comments

ashwinnair99•1h ago

A year ago this would have been considered impossible. The hardware is moving faster than anyone's software assumptions.

cogman10•1h ago

This isn't a hardware feat, this is a software triumph.

They didn't make special purpose hardware to run a model. They crafted a large model so that it could run on consumer hardware (a phone).

pdpi•56m ago

It's both.

We haven't had phones running laptop-grade CPUs/GPUs for that long, and that is a very real hardware feat. Likewise, nobody would've said running a 400b LLM on a low-end laptop was feasible, and that is very much a software triumph.

smallerize•37m ago

The iPhone 17 Pro launched 8 months ago with 50% more RAM and about double the inference performance of the previous iPhone Pro (also 10x prompt processing speed).

mannyv•22m ago

The software has real software engineers working on it instead of researchers.

Remember when people were arguing about whether to use mmap? What a ridiculous argument.

At some point someone will figure out how to tile the weights and the memory requirements will drop again.

snovv_crash•11m ago

The real improvement will be when the software engineers get into the training loop. Then we can have MoE that use cache-friendly expert utilisation and maybe even learned prefetching for what the next experts will be.

simopa•1h ago

It's crazy to see a 400B model running on an iPhone. But moving forward, as the information density and architectural efficiency of smaller models continue to increase, getting high-quality, real-time inference on mobile is going to become trivial.

firstbabylonian•1h ago

> SSD streaming to GPU

Is this solution based on what Apple describes in their 2023 paper 'LLM in a flash' [1]?

1: https://arxiv.org/abs/2312.11514

simonw•57m ago

Yes. I collected some details here: https://simonwillison.net/2026/Mar/18/llm-in-a-flash/

zozbot234•35m ago

A similar approach was recently featured here: https://news.ycombinator.com/item?id=47476422 Though iPhone Pro has very limited RAM (12GB total) which you still need for the active part of the model. (Unless you want to use Intel Optane wearout-resistant storage, but that was power hungry and thus unsuitable to a mobile device.)

simonw•25m ago

Yeah, this new post is a continuation of that work.

cj00•59m ago

It’s 400B but it’s mixture of experts so how many are active at any time?

simonw•58m ago

Looks like it's Qwen3.5-397B-A17B so 17B active. https://github.com/Anemll/flash-moe/tree/iOS-App

rwaksmunski•48m ago

Apple might just win the AI race without even running in it. It's all about the distribution.

raw_anon_1111•40m ago

Apple is already one of the winners of the AI race. It’s making much more profit (ie it ain’t losing money) on AI off of ChatGPT, Claude, Grok (you would be surprised at how many incels pay to make AI generated porn videos) subscriptions through the App Store.

It’s only paying Google $1 billion a year for access to Gemini for Siri

detourdog•34m ago

Apple’s entire yearly capex is a fraction of the AI spend of the persumed AI winners.

devmor•16m ago

Which is mostly insane amounts of debt leveraged entirely on the moonshot that they will find a way to turn a profit on it within the next couple years.

Apple’s bet is intelligent, the “presumed winners” are hedging our economic stability on a miracle, like a shaking gambling addict at a horse race who just withdrew his rent money.

qingcharles•16m ago

Plus all those pricey 512GB Mac Studios they are selling to YouTubers.

dzikimarian•16m ago

Because someone managed to run LLM on an iPhone at unusable speed Apple won AI race? Yeah, sure.

naikrovek•11m ago

whoa, save some disbelief for later, don't show it all at once.

causal•38m ago

Run an incredible 400B parameters on a handheld device.

0.6 t/s, wait 30 seconds to see what these billions of calculations get us:

"That is a profound observation, and you are absolutely right ..."

WarmWash•16m ago

I don't think we are ever going to win this. The general population loves being glazed way too much.

baal80spam•11m ago

> The general population loves being glazed way too much.

This is 100% correct!

intrasight•12m ago

Better than waiting 7.5 million years to have a tell you the answer is 42.

pier25•30m ago

https://xcancel.com/anemll/status/2035901335984611412

_air•20m ago

This is awesome! How far away are we from a model of this capability level running at 100 t/s? It's unclear to me if we'll see it from miniaturization first or from hardware gains

Show HN: Threadprocs – executables sharing one address space (0-copy pointers)

Brew cask audit finds apps unmanaged by homebrew

Built a free website speed test tool for anyone with a public site

The Move Your Agents Will Discover

LoCoMo AI Benchmark: 6.4% of answer key wrong, judge accepts 63% of fake answers

How do you trust a new Linux distribution?

Mark Zuckerberg Is Building an AI Agent to Help Him Be CEO

How the idea of human superiority over nature was invented

OnlyFans owner Leonid Radvinsky dies of cancer at 43

The next evolution of AI user interfaces

2D Discrete Fourier Transform fixes rainbows on manga on color eInk Kaleido 3

Do AI Users Prioritize Accuracy or Speed?

AI Safety: A Call for Emotional Integration

When Should a Manager Step In?

Applying Nyquist-Shannon Sampling to LLM Prompts

Steve Jobs Talks iBook, AirPort, and More in Newly Surfaced 1999 Video

The role of AI companies in large formalisation projects

Tinderbox City

Iran foreign ministry denies Trump's 'good talks' claim

Meditation, Language, and LLMs

DietPi released a new version v10.2

CloudHop – Free GUI to transfer files between 70 cloud services

Show HN: Story Trainer, a self-guided tool for learning story structure

What Happens If AI Makes Things Too Easy for Us?

Bullet Factory: 3D games created on a prompt

Strait of Hormuz closure hits America's generic drug prescriptions

Are Corruption and Regulation Less Burdensome in Special Economic Zones?

Remote small business in Africa is an asymmetric opportunity

Landmark trial in New Mexico to decide if Meta misled kids about safety risks

Intel Fred Can Yield Greater Performance – Fred Benchmarks on Panther Lake