Hi HN, I've always been fascinated by local inference, especially on small hardware like Raspberry Pi. After researching online, I found that many pre-made solutions seemed to have a piece of the puzzle, like speech-to-text or LLM handling, that was managed by a cloud service. Because of this, I decided to give it a try and build this tiny agent that can run entirely on a Raspberry Pi 5 (16GB/8GB) and it's 100% local. It is capable of executing tools and runs some of the best small models I could find, such as Qwen3:1.7B and Gemma3:1B.
From wake-word detection (using vosk), to transcription (faster-whisper), to the actual LLM inference, everything happens on the Pi 5 itself. It was definitely a challenge given the hardware constraints, but I learned a lot along the way.
I've added a demo and detailed everything in my blog post if you're curious: https://blog.simone.computer/an-agent-desktoy
Cheers!