Qwen3 30B A3B Hits 13 token/s on 4xRaspberry Pi 5

https://github.com/b4rtaz/distributed-llama/discussions/255

58•b4rtazz•3h ago

Comments

geerlingguy•59m ago

distributed-llama is great, I just wish it would work with more models. I've been happy with ease of setup and its ongoing maintenance compared to Exo, and performance vs llama.cpp RPC mode.

alchemist1e9•26m ago

Any pointers to what is SOTA for cluster of hosts with CUDA GPUs but not enough vram for full weights, yet 10Gbit low latency interconnects?

If that problem gets solved, even if for only a batch approach that enables parallel batch inference resulting in high total token/s but low per session, and for bigger models, then it would he a serious game changer for large scale low cost AI automation without billions capex. My intuition says it should be possible, so perhaps someone has done it or started on it already.

echelon•41m ago

This is really impressive.

If we can get this down to a single Raspberry Pi, then we have crazy embedded toys and tools. Locally, at the edge, with no internet connection.

Kids will be growing up with toys that talk to them and remember their stories.

We're living in the sci-fi future. This was unthinkable ten years ago.

taminka•23m ago

i feel sorry for your kids if you think this shit is inspiring lol

chagpt is literally leading ppl with higher education to have full on psychosis by feeding into their insane delusions and confirmation bias, im sure a less smart version of this is a perfect toy for a kid w/o a fullt developed brain yet

literally go touch grass bro...

dingdingdang•28m ago

Very impressive numbers.. wonder how this would scale on 4 relatively modern desktop PCs, like say something akin to a i5 8th Gen Lenovo ThinkCentre, these can be had for very cheap. But like @geerlingguy indicates - we need model compatibility to go up up up! As an example it would amazing to see something like fastsdcpu run distributed to democratize accessibility-to/practicality-of image gen models for people with limited budgets but large PC fleets ;)

rthnbgrredf•15m ago

I think it is all well and good, but the most affordable option is probably still to buy a used MacBook with 16/32 or 64 GB (depending on the budget) unified memory and install Asahi Linux for tinkering.

Graphics cards with decent amount of memory are still massively overpriced (even used), big, noisy and draw a lot of energy.

mehdibl•26m ago

1. This is Q4

2. This remain slow

3. The context window used here is likely 8k or similar which makes it unusable for bigger input/output.

Models already work fine on phones just try https://github.com/google-ai-edge/gallery and you will see local AI running on phones fine.

The Smartest Nations: Ranking Intelligence in 2025

More than 10 European startups became unicorns this year

Reg attends job interview hosted by AI avatar, struggles to exit uncanny valley

Our speech of many parts

Image to Video AI – Transform Images into Videos

Public toilet locator app just gone viral in Russia

Vibe Coding Through the Berghain Challenge

BBC News Robots.txt

Formality on Demand

DuckDuckGo founder: AI surveillance should be banned

Ask HN: Useful AI applications in regular businesses?

Optical Generative Models

A hidden simplicity behind how people move: geography's role in relocation

Mr. Tompkins in Wonderland" – Space, Time and Relativity [video]

Show HN: NeKernel v0.0.5

How to Think about Surveillance

Go tool that sorts methods by call graph analysis

Upcoming HHS report will link autism to common pain reliever, folate deficiency

Wayland. (Budgie Desktop, 2023)

NY Times: Why Are More Millionaires Renting?

How Binary JSON Works in YDB (2022)

The Agentic Systems Series

Ask HN: How do you stop teams from overestimating sprint capacity?

Victoria Woodhull – The first woman to run for president

Gates-Backed Study: Flu Shots Linked to 27% Higher Heart Injury Risk

996

Scale AI former CTO launches AI agent that could solve big data biggest problem

Ask HN: Why do LLMs struggle with word count?

A16Z scouting ambitious Swiss founders for $1M accelerator

Do Hangover Supplements Work?