They write “valid” HTML/CSS code but can still ship a broken layout, a clipped dropdown, or a page at the wrong URL. Playwright scripts can assert modal.isVisible() without knowing the modal is rendered off-screen.
Essentially, coding agents need “eyes” to verify their own UI work.
frontend-visualqa is a CLI + MCP server for Claude Code and Codex for visual testing, verification, and QA of a website.
You give it a URL and natural-language claims:
frontend-visualqa verify http://localhost:8000/dashboard.html \
--claims \
'The API status indicator shows Active' \
'The monthly quota progress bar is completely filled'
# → first claim passes, second fails (label says 100% but bar is ~65% full)
It catches visual<->DOM disagreements that selectors are blind to.You can also test interactive flows without hardcoded data:
frontend-visualqa verify 'http://localhost:8000/booking_form.html' \
--claims 'The date on the confirmation page matches the date selected on the calendar' \
--navigation-hint "Fill out the form with example data"
# → fails: fills the form, picks a date, books the slot, and catches an off-by-one date error on the confirmation page
The visual evaluation runs on n1, a VLM by Yutori that is post-trained specifically for browser interaction with RL on live websites. It navigates pages autonomously — so when a coding agent sends it to the wrong URL, n1 sees the wrong page, self-corrects, and reports this correction. On browser-use benchmarks n1 slightly outperforms Opus 4.6 and GPT-5.4 while running 2—3x faster at 4—5x lower cost: https://yutori.com/blog/introducing-n1How does this compare to?
1. Playwright CLI+MCP - Gold standard, but blind. - frontend-visualqa is the visual verification layer on top.
2. OpenAI Playwright skill / Claude + Dev-Browser - similar idea, but n1 is specifically trained for browser use (thus faster and cheaper), and the claim-based approach structures what to check rather than hoping the model notices everything. - Not locked to a TUI or IDE.
Known limitations: - Native <select> dropdowns render as OS-level widgets outside the viewport — n1 can't see or interact with them. Custom dropdowns work fine. - Small visual/numeric disagreements (red vs green status dot) are a known hard case. Improving with model updates.
Requires a Yutori API key (new accounts get free credits). DM me if you run out of credits.