The core loop: dump accessibility tree via uiautomator → parse and filter to ~40 relevant elements → LLM returns {think, plan, action} → execute via ADB → repeat.
Some technical decisions worth noting:
- Primary input is the accessibility tree, not vision. Vision (screenshots + multimodal model) is only a fallback for when the tree is empty (WebViews, Flutter).
- Stuck detection: if the screen state doesn't change for 3 steps, recovery kicks in with back navigation, home, or app re-launch.
- Two execution modes: AI-powered workflows (JSON, LLM decides navigation) and deterministic flows (YAML, fixed sequences, no LLM calls).
- ADB over WiFi + Tailscale for remote control. The phone becomes an always-on agent you can trigger from anywhere.
- Supports Groq (free tier), OpenAI, OpenRouter, Bedrock. Ollama support just landed for fully local inference.
- One-line install: curl -fsSL https://droidclaw.ai/install.sh | sh
Built with Bun + TypeScript. 35 example workflows included covering messaging, social, productivity, research, and lifestyle tasks.
https://github.com/unitedbyai/droidclaw https://droidclaw.ai