As humans, we rely on visual diffs for that. We open them, scan quickly, and catch obvious regressions. Agents are completely out of that loop.
I’m a co-founder of Argos (visual testing), and I recently shipped a CLI to expose visual diffs in a way an agent can actually use, instead of going through a UI.
Once I wired it into an agent workflow, a few interesting things started happening. The agent started catching obvious regressions. Sometimes it would refuse to approve its own PR. With a good prompt, it even fixed the issue after seeing the diff and iterating.
It’s still rough and not reliable enough to trust on its own. A lot depends on how well the agent understands the codebase. In local tests, it sometimes gets stuck in loops and burns through tokens .
Giving agents “eyes” on UI changes might be an interesting feedback loop for more autonomous dev agents in the future.