It works by hijacking macOS iPhone Mirroring: the MCP server captures the mirrored screen, runs Apple Vision OCR to find UI elements with tap coordinates, then sends input through a Karabiner DriverKit virtual keyboard/mouse. The AI sees the screen, decides what to tap/type/swipe, and the iPhone responds.
Some interesting technical rabbit holes:
- iOS elements aren't exposed via Accessibility — the mirroring window is a single opaque AXHostingView with zero children. OCR is the only way to read the screen. - Clipboard paste doesn't work programmatically. At all. We tried everything — HID Cmd+V, AX menu automation, AppleScript, CGEvents. The paste bridge lives deep in the Continuity/Handoff stack and requires a physical user gesture. - Input goes through a Karabiner DriverKit extension that presents as a US ANSI keyboard to iOS, regardless of your Mac's keyboard layout.
Limitations I should be upfront about: macOS 15+ only, one phone at a time, no clipboard bridge for paste, and the phone needs to stay unlocked during a session.
It's open source (Apache 2.0): https://github.com/jfarcand/iphone-mirroir-mcp
Properly configure permissions if you use it with OpenClaw ;-)