We’ve been thinking about a core limitation in current mobile AI assistants:
Most systems (e.g., Apple Intelligence, Google Assistant–style integrations) rely on predefined schemas and coordinated APIs. Apps must explicitly implement the assistant’s specification. This limits extensibility and makes the ecosystem tightly controlled.
On the other hand, GUI-based agents (e.g., AppAgent, AutoDroid, droidrun) rely on screenshots + accessibility, which gives broad power but weak capability boundaries.
So we built Mobile-MCP, an Android-native realization of the Model Context Protocol (MCP) using the Intent framework.
The key idea:
- Apps declare MCP-style capabilities (with natural-language descriptions) in their manifest.
- An LLM-based assistant can autonomously discover all exposed capabilities on-device via the PackageManager.
- The LLM selects which API to call and generates parameters based on natural language description.
- Invocation happens through standard Android service binding / Intents.
Unlike Apple/Android-style coordinated integrations:
- No predefined action domains.
- No centralized schema per assistant.
- No per-assistant custom integration required.
- Tools can be dynamically added and evolve independently.
The assistant doesn’t need prior knowledge of specific apps — it discovers and reasons over capabilities at runtime.
We’ve built a working prototype + released the spec and demo:
GitHub: https://github.com/system-pclub/mobile-mcp
Spec: https://github.com/system-pclub/mobile-mcp/blob/main/spec/mobile-mcp_spec_v1.md
Demo: https://www.youtube.com/watch?v=Bc2LG3sR1NY&feature=youtu.be
Paper: https://github.com/system-pclub/mobile-mcp/blob/main/paper/mobile_mcp.pdf
Curious what people think:
Is OS-native capability broadcasting + LLM reasoning a more scalable path than fixed assistant schemas or GUI automation?
Would love feedback from folks working on mobile agents, security, MCP tooling, or Android system design.