Show HN: Local LLM on a Pi 4 controlling hardware via tool calling

2•stfurkan•1h ago

Comments

stfurkan•1h ago

Hi HN,

I spent the weekend experimenting to see if I could get a proper LLM running locally on an old Raspberry Pi 4 (4GB), and more importantly, if I could get it to interact with the physical world.

I ended up using PrismML's new Bonsai models. Because they are genuinely 1-bit (trained from scratch at 1-bit, not quantized down to 4-bit), they actually fit. The 4B parameter model is ~570 MB, and the 1.7B is ~240 MB.

I loaded them through llama.cpp's router mode. I get around 2 tok/s on the 4B model for better reasoning, and 4-5 tok/s on the 1.7B when I just need speed. I tried Gemma 4 E2B first, but it was just too slow on 4GB of RAM.

The fun part: I wired up a cheap TM1637 4-digit display to the GPIO pins. Since Bonsai supports native tool calling, I wrote a small Python proxy that injects an update_display function into requests. When the model decides to use the tool, the proxy catches the streaming call, extracts the text, and drives the display. You can tell it to "show 1453" and it physically lights up.

It’s definitely just a weekend project (7-segment displays can't render W or M, self-signed certs, etc.). The code and setup scripts are all in the repo.

I’m thinking about adding servos or sensors next. Would love to hear your thoughts or see if anyone else is building edge AI hardware projects!

trailheadsec•1h ago

What’s the quality of the model output at this RAM / model selection? Local models fascinate me; I run Ollama on an M1 Max MacBook Pro with 64GB of RAM, but I am a little bit inexperienced with the ins and outs. Thank you for sharing!

stfurkan•48m ago

I specifically chose PrismML's 1-bit models because their tiny size allows them to actually fit on smaller hardware like the Pi. The 1.7B model is great for basic tasks and tool triggers, while the 4B model seems reasonable for some daily tasks, though it's much slower on this setup. If you try these models on your M1 Max, I assume they'll run incredibly fast. I previously tried them on a VPS and the inference speed was really good for my experiment.

Günther Anders's Bleak Picture of the Tech-Perfected Society

Specs over Vibes: Consistent AI Results Ft. Mark Freeman

KldloadOS 1.0.4 – Kubernetes on (ZFS and Cilium eBPF and WireGuard) in 15 Mins

Most AI travel apps don't help you travel

I don't want to fill out your contact form (2024)

Open Source MCP server that refines prompts from retrieval evidence

GitHub Copilot Session Search and Resume CLI

B-trees and database indexes (2024)

DNA forensics is transforming studies of ancient manuscripts

'"one" | "two" | string' autocomplete TypeScript trick

Ongoing system issues w/ state distributor has Mississippi running out of liquor

AI Integration Pack: 9 Production Python Modules for Payments, CRM, SMS

Surely there must be a way to make container secrets less dangerous?

You can have an RSS dependent website in 2026

The AI Industry's Most Expensive Mistake

A Macroeconomic Perspective on Stock Market Valuation Ratios

2026 Is the New 2016

Why Trump Mishandled Iran

Any USB drive or cable you plug in might be a silent killer

Made an eBPF syscall tracer with a live TUI

Old, Discontinued Fiats Are Outselling New Fiats

I solved NP‑complete problems by turning them into planets

Gliding on Snow: One Man's Dream

Give Them Two Choices

The AI Productivity Paradox: Why the AI Multiplier Is Less Than 2x

Javier Milei's bribery scandal may have derailed Argentina's crypto investment

Language, Curiosity and Life – By Masato Hagiwara

New metal with triple copper's heat conduction challenges fundamental physics

Harvesting easter eggs: An exploratory study of enjoying transnarrative media

ORAC-NT MedChem Copilot that blocks synthetically infeasible molecules