frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Utah's hottest new power source is 15k feet below the ground

https://www.gatesnotes.com/utahs-hottest-new-power-source-is-below-the-ground
124•mooreds•3h ago•74 comments

How the "Kim" dump exposed North Korea's credential theft playbook

https://dti.domaintools.com/inside-the-kimsuky-leak-how-the-kim-dump-exposed-north-koreas-credent...
153•notmine1337•4h ago•20 comments

A Navajo weaving of an integrated circuit: the 555 timer

https://www.righto.com/2025/09/marilou-schultz-navajo-555-weaving.html
60•defrost•3h ago•9 comments

Shipping textures as PNGs is suboptimal

https://gamesbymason.com/blog/2025/stop-shipping-pngs/
41•ibobev•3h ago•15 comments

I'm Making a Beautiful, Aesthetic and Open-Source Platform for Learning Japanese

https://kanadojo.com
37•tentoumushi•2h ago•11 comments

C++26: Erroneous Behaviour

https://www.sandordargo.com/blog/2025/02/05/cpp26-erroneous-behaviour
12•todsacerdoti•1h ago•8 comments

Troubleshooting ZFS – Common Issues and How to Fix Them

https://klarasystems.com/articles/troubleshooting-zfs-common-issues-how-to-fix-them/
14•zdw•3d ago•0 comments

A history of metaphorical brain talk in psychiatry

https://www.nature.com/articles/s41380-025-03053-6
10•fremden•1h ago•2 comments

Qwen3 30B A3B Hits 13 token/s on 4xRaspberry Pi 5

https://github.com/b4rtaz/distributed-llama/discussions/255
277•b4rtazz•13h ago•115 comments

Over 80% of Sunscreen Performed Below Their Labelled Efficacy (2020)

https://www.consumer.org.hk/en/press-release/528-sunscreen-test
87•mgh2•4h ago•79 comments

We hacked Burger King: How auth bypass led to drive-thru audio surveillance

https://bobdahacker.com/blog/rbi-hacked-drive-thrus/
272•BobDaHacker•10h ago•148 comments

The maths you need to start understanding LLMs

https://www.gilesthomas.com/2025/09/maths-for-llms
454•gpjt•4d ago•99 comments

Oldest recorded transaction

https://avi.im/blag/2025/oldest-txn/
135•avinassh•9h ago•59 comments

What to Do with an Old iPad

http://odb.ar/blog/2025/09/05/hosting-my-blog-on-an-iPad-2.html
40•owenmakes•1d ago•27 comments

Anonymous recursive functions in Racket

https://github.com/shriram/anonymous-recursive-function
46•azhenley•2d ago•12 comments

Stop writing CLI validation. Parse it right the first time

https://hackers.pub/@hongminhee/2025/stop-writing-cli-validation-parse-it-right-the-first-time
56•dahlia•5h ago•20 comments

Using Claude Code SDK to reduce E2E test time

https://jampauchoa.substack.com/p/best-of-both-worlds-using-claude
96•jampa•6h ago•66 comments

Matmul on Blackwell: Part 2 – Using Hardware Features to Optimize Matmul

https://www.modular.com/blog/matrix-multiplication-on-nvidias-blackwell-part-2-using-hardware-fea...
7•robertvc•1d ago•0 comments

GigaByte CXL memory expansion card with up to 512GB DRAM

https://www.gigabyte.com/PC-Accessory/AI-TOP-CXL-R5X4
41•tanelpoder•5h ago•38 comments

Microsoft Azure: "Multiple international subsea cables were cut in the Red Sea"

https://azure.status.microsoft/en-gb/status
100•djfobbz•3h ago•13 comments

Why language models hallucinate

https://openai.com/index/why-language-models-hallucinate/
133•simianwords•16h ago•147 comments

Processing Piano Tutorial Videos in the Browser

https://www.heyraviteja.com/post/portfolio/piano-reader/
25•catchmeifyoucan•2d ago•6 comments

Gloria funicular derailment initial findings report (EN) [pdf]

https://www.gpiaaf.gov.pt/upload/processos/d054239.pdf
9•vascocosta•2h ago•6 comments

AI surveillance should be banned while there is still time

https://gabrielweinberg.com/p/ai-surveillance-should-be-banned
461•mustaphah•10h ago•169 comments

Baby's first type checker

https://austinhenley.com/blog/babytypechecker.html
58•alexmolas•3d ago•15 comments

Qantas is cutting executive bonuses after data breach

https://www.flightglobal.com/airlines/qantas-slashes-executive-pay-by-15-after-data-breach/164398...
39•campuscodi•2h ago•9 comments

William James at CERN (1995)

http://bactra.org/wm-james-at-cern/
13•benbreen•1d ago•0 comments

Rug pulls, forks, and open-source feudalism

https://lwn.net/SubscriberLink/1036465/e80ebbc4cee39bfb/
242•pabs3•18h ago•118 comments

Rust tool for generating random fractals

https://github.com/benjaminrall/chaos-game
4•gidellav•2h ago•0 comments

Europe enters the exascale supercomputing league with Jupiter

https://ec.europa.eu/commission/presscorner/detail/en/ip_25_2029
50•Sami_Lehtinen•4h ago•34 comments
Open in hackernews

Qwen3 30B A3B Hits 13 token/s on 4xRaspberry Pi 5

https://github.com/b4rtaz/distributed-llama/discussions/255
277•b4rtazz•13h ago

Comments

geerlingguy•10h ago
distributed-llama is great, I just wish it would work with more models. I've been happy with ease of setup and its ongoing maintenance compared to Exo, and performance vs llama.cpp RPC mode.
alchemist1e9•10h ago
Any pointers to what is SOTA for cluster of hosts with CUDA GPUs but not enough vram for full weights, yet 10Gbit low latency interconnects?

If that problem gets solved, even if for only a batch approach that enables parallel batch inference resulting in high total token/s but low per session, and for bigger models, then it would he a serious game changer for large scale low cost AI automation without billions capex. My intuition says it should be possible, so perhaps someone has done it or started on it already.

echelon•10h ago
This is really impressive.

If we can get this down to a single Raspberry Pi, then we have crazy embedded toys and tools. Locally, at the edge, with no internet connection.

Kids will be growing up with toys that talk to them and remember their stories.

We're living in the sci-fi future. This was unthinkable ten years ago.

striking•8h ago
I think it's worth remembering that there's room for thoughtful design in the way kids play. Are LLMs a useful tool for encouraging children to develop their imaginations or their visual or spatial reasoning skills? Or would these tools shape their thinking patterns to exactly mirror those encoded into the LLM?

I think there's something beautiful and important about the fact that parents shape their kids, leaving with them some of the best (and worst) aspects of themselves. Likewise with their interactions with other people.

The tech is cool. But I think we should aim to be thoughtful about how we use it.

supportengineer•7h ago
They are better off turning this shit off and playing outside getting dirty and riding bikes
ugh123•6h ago
What about a kid who lives in an urban area without parks?
hkt•1h ago
Campaign for parks
bongodongobob•5h ago
You can do both bro.
Aurornis•3h ago
Parent here. Kids have a lot of time and do a lot of different things. Some times it rains or snows or we’re home sick. Kids can (and will) do a lot of different things and it’s good to have options.
bigyabai•7h ago
> Kids will be growing up with toys that talk to them and remember their stories.

What a radical departure from the social norms of childhood. Next you'll tell me that they've got an AI toy that can change their diaper and cook Chef Boyardee.

manmal•6h ago
An LLM in my kids‘ toys only over my cold, dead body. This can and will go very very wrong.
fragmede•5h ago
If a raspberry pi can do all that, imagine the toys Bill Gates' grandkids have access to!

We're at the precipice of having a real "A Young Lady's Illustrated Primer" from The Diamond Age.

dingdingdang•10h ago
Very impressive numbers.. wonder how this would scale on 4 relatively modern desktop PCs, like say something akin to a i5 8th Gen Lenovo ThinkCentre, these can be had for very cheap. But like @geerlingguy indicates - we need model compatibility to go up up up! As an example it would amazing to see something like fastsdcpu run distributed to democratize accessibility-to/practicality-of image gen models for people with limited budgets but large PC fleets ;)
rthnbgrredf•10h ago
I think it is all well and good, but the most affordable option is probably still to buy a used MacBook with 16/32 or 64 GB (depending on the budget) unified memory and install Asahi Linux for tinkering.

Graphics cards with decent amount of memory are still massively overpriced (even used), big, noisy and draw a lot of energy.

ivape•8h ago
It just came to my attention that the 2021 M1 Max 64gb is less than $1500 used. That’s 64gb of unified memory at regular laptop prices, so I think people will be well equipped with AI laptops rather soon.

Apple really is #2 and probably could be #1 in AI consumer hardware.

jeroenhd•8h ago
Apple is leagues ahead of Microsoft with the whole AI PC thing and so far it has yet to mean anything. I don't think consumers care at all about running AI, let alone running AI locally.

I'd try the whole AI thing on my work Macbook but Apple's built-in AI stuff isn't available in my language, so perhaps that's also why I haven't heard anybody mention it.

ivape•7h ago
People don’t know what they want yet, you have to show it to them. Getting the hardware out is part of it, but you are right, we’re missing the killer apps at the moment. The very need for privacy with AI will make personal hardware important no matter what.
mycall•6h ago
Two main factors are holding back the "killer app" for AI. Fix hallucinations and make agents more deterministic. Once these are in place, people will love AI when it can make them money somehow.
croes•5h ago
You can’t fix the hallucinations
herval•5h ago
How does one “fix hallucinations” on an LLM? Isn’t hallucinating pretty much all it does?
kasey_junk•2h ago
Coding agents have shown how. You filter the output against something that can tell the llm when it’s hallucinating.

The hard part is identifying those filter functions outside of the code domain.

dotancohen•2h ago
It's called a RAG, and it's getting very well developed for some niche use cases such as legal, medical, etc. I've been personally working on one for mental health, and please don't let anybody tell you that they're using an LLM as a mental health counselor. I've been working on it for a year and a half, and if we get it to production ready in the next year and a half I will be surprised. In keeping up with the field, I don't think anybody else is any closer than we are.
MengerSponge•3h ago
Other than that, Mrs. Lincoln, how was the Agentic AI?
dotancohen•2h ago

  > People don’t know what they want yet, you have to show it to them
Henry Ford famously quipped that had he asked his customers what they wanted, they would have wanted a faster horse.
wkat4242•4h ago
M1 doesn't exactly have stellar memory bandwidth for this day and age though
Aurornis•3h ago
M1 Max with 64GB has 400GB/s memory bandwidth.

You have to get into the highest 16-core M4 Max configurations to begin pulling away from that number.

jibbers•8h ago
Get an Apple Silicon MacBook with a broken screen and it’s an even better deal.
giancarlostoro•6h ago
You dont even need Asahi, you can run comfy on it but I recommend the Draw Things app, it just works and holds your hand a LOT. I am able to run a few models locally, the underlying app is open source.
mrbonner•3h ago
I used Draw Thing after fighting with comfyui.
croes•5h ago
What about AMD Ryzen AI Max+ 395 Mini PCs with upto 128GB unified memory?
evilduck•5h ago
Their memory bandwidth is the problem. 256 GB/s is really, really slow for LLMs.

Seems like at the consumer hardware level you just have to pick your poison or what one factor you care about most. Macs with a Max or Ultra chip can have good memory bandwidth but low compute, but also ultra low power consumption. Discrete GPUs have great compute and bandwidth but low to middling VRAM, and high costs and power consumption. The unified memory PCs like the Ryzen AI Max and the Nvidia DGX deliver middling compute, higher VRAMs, and terrible memory bandwidth.

codedokode•2h ago
But for matrix multiplication, isn't compute more important, as there are N³ multiplications but just N² numbers in a matrix?

Also I don't think power consumption is important for AI. Typically you do AI at home or in the office where there is lot of electricity.

evilduck•1h ago
>But for matrix multiplication, isn't compute more important, as there are N³ multiplications but just N² numbers in a matrix?

Being able to quickly calculate a dumb or unreliable result because you're VRAM starved is not very useful for most scenarios. To run capable models you need VRAM, so high VRAM and lower compute is usually more useful than the inverse (a lot of both is even better, but you need a lot of money and power for that).

Even in this post with four RPis, the Qwen3 30 A3B is still an MOE model and not a dense model. It runs fast with only 3B active parameters and can be parallelized across computers but it's much less capable than a dense 30B model running on a single GPU.

> Also I don't think power consumption is important for AI. Typically you do AI at home or in the office where there is lot of electricity.

Depends on what scale you're discussing. If you want to get similar VRAM as a 512GB Mac Studio Ultra with a bunch of Nvidia GPUs like RTX 3090 cards you're not going to be able to run that on a typical American 15 AMP circuits, you'll trip a breaker half way there.

ekianjo•29m ago
Works very well and very fast with this Qwen3 30B A3B model.
Aurornis•3h ago
> and install Asahi Linux for tinkering.

I would recommend sticking to macOS if compatibility and performance are the goal.

Asahi is an amazing accomplishment, but running native optimized macOS software including MLX acceleration is the way to go unless you’re dead-set on using Linux and willing to deal with the tradeoffs.

j45•8h ago
Connect a gpu into it with an eGPU chassis and you're running one way or the other.
trebligdivad•3h ago
On my (single) AMD 3950x running entirely in CPU (llama -t32 -dev none), I was getting 14 tokens/s running Qwen3-Coder-30B-A3B-Instruct-IQ4_NL.gguf last night. Which is the best I've had out of a model that doesn't feel stupid.
codedokode•2h ago
How much RAM it is using by the way? I see 30B, but without knowing precision it is unclear how much memory one needs.
MalikTerm•1h ago
Q4 is usually around 4.5 bits per parameter but can be more as some layers are quantised to a higher precision, which would suggest 30 billion * 4.5 bit = 15.7GB, but the quant the GP is using is 17.3GB and 19.7GB for the article. Add around 20-50% overhead for various things and then some % for each 1k of tokens in the context and you're probably looking at no more than 32GB. If you're using something like llama.cpp which can offload some of the model to the GPU you'll still get decent performance even on a 16gb VRAM GPU.
trebligdivad•25m ago
Spunds close! top says my llama is using 17.7G virt, 16.6G resident with: ./build/bin/llama-cli -m /discs/fast/ai/Qwen3-Coder-30B-A3B-Instruct-IQ4_NL.gguf --jinja -ngl 99 --temp 0.7 --min-p 0.0 --top-p 0.80 --top-k 20 --presence-penalty 1.0 -t 32 -dev none
kosolam•8h ago
How is this technically done? How does it split the query and aggregates the results?
magicalhippo•7h ago
From the readme:

More devices mean faster performance, leveraging tensor parallelism and high-speed synchronization over Ethernet.

The maximum number of nodes is equal to the number of KV heads in the model #70.

I found this[1] article nice for an overview of the parallelism modes.

[1]: https://medium.com/@chenhao511132/parallelism-in-llm-inferen...

varispeed•8h ago
So would 40x RPi 5 get 130 token/s?
SillyUsername•8h ago
I imagine it might be limited by number of layers and you'll get diminishing returns as well at some point caused by network latency.
VHRanger•7h ago
Most likely not because of NUMA bottlenecks
reilly3000•4h ago
It has to be 2^n nodes and limited to one per attention head that the model has.
behnamoh•8h ago
Everything runs on a π if you quantize it enough!

I'm curious about the applications though. Do people randomly buy 4xRPi5s that they can now dedicate to running LLMs?

ryukoposting•8h ago
I'd love to hook my development tools into a fully-local LLM. The question is context window and cost. If the context window isn't big enough, it won't be helpful for me. I'm not gonna drop $500 on RPis unless I know it'll be worth the money. I could try getting my employer to pay for it, but I'll probably have a much easier time convincing them to pay for Claude or whatever.
exitb•7h ago
I think the problem is that getting multiple Raspberry Pi’s is never the cost effective way to run heavy loads.
halJordan•7h ago
This is some sort of joke right?
numpad0•7h ago
MI50 is cheaper
rs186•7h ago
$500 gives you about 6 RPi 5 8GB or 4 16GB, excluding accessories or other necessary equipment to get this working.

You'll be much better off spending that money on something else more useful.

behnamoh•7h ago
> $500

Yeah, like a Mac Mini or something with better bandwidth.

ekianjo•20m ago
Raspberry Pis going up in price make them very unattractive since there is a wealth of cheap second used better hardware out there such as NUCs with Celerons
fastball•6h ago
Capability of the model itself is presumably the more important question than those other two, no?
amelius•6h ago
> I'd love to hook my development tools into a fully-local LLM.

Karpathy said in his recent talk, on the topic of AI developer-assistants: don't bother with less capable models.

So ... using an rpi is probably not what you want.

fexelein•6h ago
I’m having a lot of fun using less capable versions of models on my local PC, integrated as a code assistant. There still is real value there, but especially room for improvements. I envision us all running specialized lightweight LLMs locally/on-device at some point.
dotancohen•2h ago
I'd love to hear more about what you're running, and on what hardware. Also, what is your use case? Thanks!
refulgentis•6h ago
It's a tough thing, I'm a solo dev supporting ~all at high quality. I cannot imagine using anything other than $X[1] at the leading edge. Why not have the very best?

Karpathy elides he is an individual. We expect to find a distribution of individuals, such that a nontrivial # of them are fine with 5-10% off the leading edge performance. Why? At least for free as in beer. At most, concerns about connectivity, IP rights, and so on.

[1] gpt-5 finally dethroned sonnet after 7 months

wkat4242•4h ago
Today's qwen3 30b is about as good as last year's state of the art. For me that's more than good enough. Many tasks don't require the best of the best either.
dpe82•4h ago
Mind linking to "his recent talk"? There's a lot of videos of him so it's a bit difficult to find what's most recent.
amelius•1h ago
https://www.youtube.com/watch?v=LCEmiRjPEtQ
dpe82•1h ago
Ah that one. Thanks!
littlestymaar•3h ago
> Karpathy said in his recent talk, on the topic of AI developer-assistants: don't bother with less capable models.

Interesting because he also said the future is small "cognitive core" models:

> a few billion param model that maximally sacrifices encyclopedic knowledge for capability. It lives always-on and by default on every computer as the kernel of LLM personal computing.

https://xcancel.com/karpathy/status/1938626382248149433#m

In which case, a raspberry Pi sounds like what you need.

ACCount37•20m ago
It's not at all trivial to build a "small but highly capable" model. Sacrificing world knowledge is something that can be done, but only to an extent, and that isn't a silver bullet.

For an LLM, size is a virtue - the larger a model is, the more intelligent it is, all other things equal - and even aggressive distillation only gets you this far.

Maybe with significantly better post-training, a lot of distillation from a very large and very capable model, and extremely high quality synthetic data, you could fit GPT-5 Pro tier of reasoning and tool use, with severe cuts to world knowledge, into a 40B model. But not into a 4B one. And it would need some very specific training to know when to fall back to web search or knowledge databases, or delegate to a larger cloud-hosted model.

And if we had the kind of training mastery required to pull that off? I'm a bit afraid of what kind of AI we would be able to train as a frontier run.

pdntspa•6h ago
Model intelligence should be part of your equation as well, unless you love loads and loads of hidden technical debt and context-eating, unnecessarily complex abstractions
th0ma5•6h ago
How do you evaluate this except for anecdote and how do we know your experience isn't due to how you use them?
pdntspa•6h ago
You can evaluate it as anecdote. How do I know you have the level of experience necessary to spot these kinds of problems as they arise? How do I know you're not just another AI booster with financial stake poisoning the discussion?

We could go back and forth on this all day.

exe34•4h ago
you got very defensive. it was a useful question - they were asking in terms of using a local LLM, so at best they might be in the business of selling raspberry pis, not proprietary LLMs.
giancarlostoro•6h ago
GPT OSS 20B is smart enough but the context window is tiny with enough files. Wonder if you can make a dumber model with a massive context window thats a middleman to GPT.
pdntspa•5h ago
Matches my experience.
giancarlostoro•4h ago
Just have it open a new context window, the other thing I wanted to try is to make a LoRa but im not sure how that works properly, it suggested a whole other model but it wasnt a pleasant experience since it’s not as obvious as diffusion models for images.
throaway920181•5h ago
It's sad that Pis are now so overpriced. They used to be fun little tinker boards that were semi-cheap.
pseudosavant•3h ago
The Raspberry Pi 2 Zero is as fast as a Pi 3, way smaller, and only costs $13 I think.

The high end Pis aren’t $25 though.

geerlingguy•2h ago
The Pi 4 is still fine for a lot of low end use cases and starts at $35. The Pi 5 is in a harder position. I think the CM5 and Pi 500 are better showcases for it than the base model.
hhh•7h ago
I have clusters of over a thousand raspberry pi’s that have generally 75% of their compute and 80% of their memory that is completely unused.
Moto7451•7h ago
That’s an interesting setup. What are you doing with that sort of cluster?
estimator7292•7h ago
99.9% of enthusiast/hobbyist clusters like this are exclusively used for blinkenlights
wkat4242•4h ago
Blinkenlights are an admirable pursuit
estimator7292•2h ago
That wasn't a judgement! I filled my homelab rack server with mechanical drives so I can get clicky noises along with the blinky lights
larodi•7h ago
Is it solar powered?
CamperBob2•6h ago
Good ol' Amdahl in action.
fragmede•6h ago
That sounds awesome, do you have any pictures?
6r17•7h ago
I mean at this point it's more of a "proof-of-work" with shared BP ; I would deff see some domotic hacker get this running - hell maybe i'll do this do if I have some spare time and want to make something like alexa with customized stuff - would still need text to speech and speech to text but that's not really the topic of his set-up ; even for pro use if that's really usable why not just spawn qwen on ARM if that's cheaper - there is a lot of way to read and leverage such bench
ugh123•6h ago
I think it serves a good test bed to test methods and models. We'll see if someday they can reduce it to 3... 2... 1 Pi5's that can match performance.
giancarlostoro•6h ago
Sometimes you buy a pi for one project start on it buy another for a different project, before you know it none are complete and you have ten Raspberry Pis lying around across various generations. ;)
dotancohen•2h ago
Arduino hobbist, same issue.

Though I must admit to first noticing the trend decades before discovering Arduino when I looked at the stack of 289, 302, and 351W intake manifolds on my shelf and realised that I need the width of the 351W manifold but the fuel injection of the 302. Some things just never change.

Zenst•5h ago
Depends on the model - if you have a sparse model with MoE, then you can divide it up into smaller nodes, your dense 30b models, I do not see them flying anytime soon.

Intel pro B50 in a dumpster PC would do you well better at this model (not enough ram for dense 30b alas) and get close to 20 tokens a second and so much cheaper.

piecerough•4h ago
"quantize enough"

though at what quality?

dotancohen•2h ago
Quantity has a quality all its own.
blululu•1h ago
For $500 you may as well spend an extra $100 and get a Mac mini with an m4 chip and 256gb of ram and avoid the headaches of coordinating 4 machines.
mmastrac•7h ago
Is the network the bottleneck here at all? That's impressive for a gigabit switch.
kristianp•1h ago
Does the switch use more power than the 4 pis?
tarruda•7h ago
I suspect you'd get similar numbers with a modern x86 mini PC that has 32GB of RAM.
misternintendo•6h ago
At this speed this is only suitable for time insensitive applications..
daveed•6h ago
I mean it's a raspberry pi...
layer8•5h ago
I’d argue that chat is a time-sensitive application, and 13 tokens/s is significantly faster than I can read.
poly2it•2h ago
Neat, but at this price scaling it's probably better to buy GPUs.
rao-v•8m ago
Nice! Cheap RK3588 boards come with 15GB of LPDDR5 RAM these days and have significantly better performance than the Pi 5 (and often are cheaper).

I get 8.2 tokens per second on a random orange pi board with Qwen3-Coder-30B-A3B at Q3_K_XL (~12.9GB). I need to try two of them in parallel ... should be significantly faster than this even at Q6.