We wanted an easier way to record UI-based automation scripts that could: - Be run by humans, like a lightweight RPA, or - Be invoked by CUAs as tool calls — improving their reliability and speed.
So we open-sourced the SDK we’ve been using internally. It currently works on macOS, and lets you: - Record desktop interactions with any app that exposes accessibility info - Record browser interactions via the Chrome extension (link on the github). - Replay recordings deterministically like an RPA, or integrate the generated UI script as a callable tool for your CUA.
We’d love feedback from anyone working on UI automation, RPAs, or CUAs — especially around reliability and edge-case handling.