I’ve been building Atom (https://github.com/rush86999/atom), an open-source, self-hosted AI automation platform.
I built this because while tools like OpenClaw are excellent for one-off scripts and personal tasks, I found them difficult to use for complex business workflows (e.g., managing invoices or SaaS ops). The main issue was State Blindness: the agent would fire a command and assume it worked, without "seeing" if the UI or state actually updated.
I just shipped a new architecture to solve this called Canvas AI Accessibility.
The Technical Concept: Instead of relying on token-heavy screenshots or raw HTML, I built a hidden semantic layer—essentially a "Screen Reader" for the LLM.
Hidden Visual Description: When the agent works, the system generates a structured, hidden description of the visual state.
Episodic Memory: The agent "reads" this layer to verify its actions. Crucially, it snapshots this state into a vector database (LanceDB).
Maturity/Governance: Before an agent is promoted from "Student" to "Autonomous," it must demonstrate it can recall these past visual states to avoid repeating errors.
Atom vs. OpenClaw: I view them as complementary. OpenClaw is the "Hands" (great for raw execution/terminal), while Atom is the "Brain" (handling state, memory, and audit trails). Atom uses Python/FastAPI vs OpenClaw's Node.js, and focuses heavily on this governance/memory layer.
The repo is self-hosted and includes the new Canvas architecture. I’d love feedback on the implementation of the hidden accessibility layer—is anyone else using "synthetic accessibility trees" for agent grounding?