Show HN: Local LLM on a Pi 4 controlling hardware via tool calling

3•stfurkan•4h ago

Comments

stfurkan•4h ago

Hi HN,

I spent the weekend experimenting to see if I could get a proper LLM running locally on an old Raspberry Pi 4 (4GB), and more importantly, if I could get it to interact with the physical world.

I ended up using PrismML's new Bonsai models. Because they are genuinely 1-bit (trained from scratch at 1-bit, not quantized down to 4-bit), they actually fit. The 4B parameter model is ~570 MB, and the 1.7B is ~240 MB.

I loaded them through llama.cpp's router mode. I get around 2 tok/s on the 4B model for better reasoning, and 4-5 tok/s on the 1.7B when I just need speed. I tried Gemma 4 E2B first, but it was just too slow on 4GB of RAM.

The fun part: I wired up a cheap TM1637 4-digit display to the GPIO pins. Since Bonsai supports native tool calling, I wrote a small Python proxy that injects an update_display function into requests. When the model decides to use the tool, the proxy catches the streaming call, extracts the text, and drives the display. You can tell it to "show 1453" and it physically lights up.

It’s definitely just a weekend project (7-segment displays can't render W or M, self-signed certs, etc.). The code and setup scripts are all in the repo.

I’m thinking about adding servos or sensors next. Would love to hear your thoughts or see if anyone else is building edge AI hardware projects!

trailheadsec•3h ago

What’s the quality of the model output at this RAM / model selection? Local models fascinate me; I run Ollama on an M1 Max MacBook Pro with 64GB of RAM, but I am a little bit inexperienced with the ins and outs. Thank you for sharing!

stfurkan•3h ago

I specifically chose PrismML's 1-bit models because their tiny size allows them to actually fit on smaller hardware like the Pi. The 1.7B model is great for basic tasks and tool triggers, while the 4B model seems reasonable for some daily tasks, though it's much slower on this setup. If you try these models on your M1 Max, I assume they'll run incredibly fast. I previously tried them on a VPS and the inference speed was really good for my experiment.

Show HN: boringBar – a taskbar-style dock replacement for macOS

Show HN: Oberon System 3 runs natively on Raspberry Pi 3 (with ready SD card)

Show HN: Claudraband – Claude Code for the Power User

Show HN: Local LLM on a Pi 4 controlling hardware via tool calling

Show HN: A social feed with no strangers

Show HN: Pardonned.com – A searchable database of US Pardons

Show HN: Rekal – Long-term memory for LLMs in a single SQLite file

Show HN: Stork – MCP server so Claude/Cursor can search 14k MCP servers AI tools

Show HN: FluidCAD – Parametric CAD with JavaScript

Show HN: T4 – a versioned datastore with branching and time-travel (S3-backed)

Show HN: Waffle – Native macOS terminal that auto-tiles sessions into a grid

Show HN: A WYSIWYG word processor in Python

Show HN: Eve – Managed OpenClaw for work

Show HN: Bullseye2D – A Dart library for cross-platform 2D games

Show HN: ReverseYC

Show HN: I built a Cargo-like build tool for C/C++

Show HN: Formal – Formal verification for AI-generated code using Lean 4

Show HN: ApplePy – Embed and Call Swift from Python (Like PyO3, but for Swift)

Show HN: Telegram feed reader using DNS TXT records for Iran's Internet shutdown

Show HN: Marimo pair – Reactive Python notebooks as environments for agents

Show HN: CSS Studio. Design by hand, code by agent

Show HN: Real-Time OLAP Infrastructure

Show HN: Chunk – macOS menu bar time-blocking app with Claude AI integration

Show HN: Android AI agent-assistant operating your apps (no adb,PC,root,etc.)

Show HN: Uncook, the Social Network for Food

Show HN: Minnow – minimal now pages via chat

Show HN: A Better Internet

Show HN: Keeper – embedded secret store for Go (help me break it)

Show HN: Moon simulator game, ray-casting

Show HN: Toy Python Lisp interpreters based on the 1960 McCarthy paper

Show HN: Local LLM on a Pi 4 controlling hardware via tool calling

Comments

Show HN: boringBar – a taskbar-style dock replacement for macOS

Show HN: Oberon System 3 runs natively on Raspberry Pi 3 (with ready SD card)

Show HN: Claudraband – Claude Code for the Power User

Show HN: Local LLM on a Pi 4 controlling hardware via tool calling

Show HN: A social feed with no strangers

Show HN: Pardonned.com – A searchable database of US Pardons

Show HN: Rekal – Long-term memory for LLMs in a single SQLite file

Show HN: Stork – MCP server so Claude/Cursor can search 14k MCP servers AI tools

Show HN: FluidCAD – Parametric CAD with JavaScript

Show HN: T4 – a versioned datastore with branching and time-travel (S3-backed)

Show HN: Waffle – Native macOS terminal that auto-tiles sessions into a grid

Show HN: A WYSIWYG word processor in Python

Show HN: Eve – Managed OpenClaw for work

Show HN: Bullseye2D – A Dart library for cross-platform 2D games

Show HN: ReverseYC

Show HN: I built a Cargo-like build tool for C/C++

Show HN: Formal – Formal verification for AI-generated code using Lean 4

Show HN: ApplePy – Embed and Call Swift from Python (Like PyO3, but for Swift)

Show HN: Telegram feed reader using DNS TXT records for Iran's Internet shutdown

Show HN: Marimo pair – Reactive Python notebooks as environments for agents

Show HN: CSS Studio. Design by hand, code by agent

Show HN: Real-Time OLAP Infrastructure

Show HN: Chunk – macOS menu bar time-blocking app with Claude AI integration

Show HN: Android AI agent-assistant operating your apps (no adb,PC,root,etc.)

Show HN: Uncook, the Social Network for Food

Show HN: Minnow – minimal now pages via chat

Show HN: A Better Internet

Show HN: Keeper – embedded secret store for Go (help me break it)

Show HN: Moon simulator game, ray-casting

Show HN: Toy Python Lisp interpreters based on the 1960 McCarthy paper