You install it as a normal CLI tool and can opt to install skills for it on Claude Code, Codex and OpenCode.
It gives agents human-like control of devices. When the agent instructs it to click a button there's a screenshot + coordinate based computer use loop. No DOM or accessibility is used.
I built it because I wanted agents to test things the way a human would, watching the screen and using it from a user perspective.
Would love feedback on onboarding, real use cases, and whether this fits naturally into existing agent workflows.