I built this because I wanted a personal AI assistant that works where I already chat - WhatsApp. It handles:
- Voice notes → transcription + AI response
- Images → vision analysis + answers
- PDFs → extracts text + answers questions
- Regular text messages
The interesting parts:
- Multi-modal handling in one conversation thread
- Session management across message types
- Conversation history without a database (uses conversation context)
The LLM integration is abstracted so you can plug in whatever provider you want.
elizabeth1212•1h ago
- Voice notes → transcription + AI response - Images → vision analysis + answers - PDFs → extracts text + answers questions - Regular text messages
The interesting parts: - Multi-modal handling in one conversation thread - Session management across message types - Conversation history without a database (uses conversation context)
The LLM integration is abstracted so you can plug in whatever provider you want.