Tine is a GNOME extension and CLI that lets an agent (I have used Claude but in theory any agent that can access the CLI) drive the desktop around using SPI trees (AT-SPI2), OCR, and visual fallbacks. Agent can do work with the a11y (AT-SPI2) trees, take screenshots, zoom in on a grid, click, enter text using a uinput device, and generally bumble their way around a Wayland Linux desktop.
This project would probably have been way easier in x11 but Wayland is teh future!!!111 Thanks for any thoughts and feedback and feels good to release something here after a decade of lurking. Decade plus but who's counting / I'm not old.
aayushkumar121•1h ago
Have you run into issues where the a11y tree is incomplete (e.g. Electron apps)? Wondering how often the grid/OCR path becomes the primary path.
tarboreus•59m ago
aayushkumar121•56m ago
Have you thought about combining weak a11y signals + OCR to build more stable refs over time, or is that too brittle in practice?
tarboreus•45m ago
I will say I have some feelings about Wayland and how hard it makes some stuff I do. I'm visually impaired and have a whole stack of tools. But this project has helped me port over 70-80% of those tools and it helps me bridge some gaps on Wayland temporarily so I can get infra set up. It's also great for the many sites that Claude blocks for whatever reason (Reddit, I am a sub mod but the a11y on Reddit is terrible, AmEx, LinkedIn).