So I built Sanna, an open-source, voice-first AI assistant for Android. Inspired by what OpenClaw did for the desktop – an LLM that actually does things – but for the device in my pocket.
What my morning looks like now: Kids in the car, I say "Hey Sanna": "Read me the new WhatsApp messages" – reads them aloud, I reply by voice. "Add milk and diapers to the shopping list" – done, stored locally. "What's on my calendar?" – spoken summary. "Play the latest Doppelgänger episode" – fetches RSS, downloads, plays. "Text my wife: picking up the kids at 4" – SMS sent, hands on the wheel.
The moments that blew my own mind: I wanted podcasts without a third-party app. So I wrote a SKILL.md – one Markdown file describing RSS feeds, downloading, and playback. No SDK, no code. Now I have a voice-controlled podcast player. One Markdown file = one new capability.
I told Sanna: "Let me know when I'm near a drugstore." It set up a recurring schedule checking GPS every few minutes, querying Google Places for pharmacies, and playing an alarm when one is nearby. I never built geofencing. The agent combined scheduler, GPS, HTTP, and alarm on its own. I asked "What are the latest sports headlines?" with no search skill. The agent found a news site, discovered its RSS feed, fetched it, and read me a summary. It generalized from the podcast skill.
I told Sanna to open Player FM via Accessibility Services but it couldn't tap Play. I said: "Try again – go to Downloads, tap the episode, then Play." It worked and remembered. Next time, first try. These aren't scripted. The LLM reasons about what tools to chain.
The architecture: background sub-agents. The main pipeline stays free while independent LLM agents run in the background. A Scheduler sub-agent fires at set times ("Every morning at 7, check my calendar and text me a summary"). A Notification sub-agent fires on events ("When my wife texts on WhatsApp, read it aloud" – semantic evaluation, no regex). An Accessibility sub-agent controls any app's UI and learns from corrections. Plus personal memory – it remembers my family and routines.
19 built-in skills as Markdown files (Gmail, Calendar, Slack, Spotify, WhatsApp, SMS, Phone, Contacts, Maps, Weather, Lists, Journal, Timer, Tasks, Notifications, Scheduler, Podcast, Headlines, Web Research). Upload new ones at runtime – no rebuild.
Tech: React Native + Kotlin, OpenAI or Claude, Picovoice wake word (on-device), OAuth PKCE (no backend). All data on-device.
Build from source or email sannabot@proton.me for a test APK.
I built this because I needed 10 more minutes in my day. Turns out an LLM with real tools finds solutions you never programmed. It's not a phone assistant – it's a second pair of hands.