frontpage.

Show HN: Frontend-VisualQA — give coding agents eyes to verify their own UI work

https://github.com/yutori-ai/frontend-visualqa

10•dhruvbatra•1h ago

Coding agents today are blind.

They write “valid” HTML/CSS code but can still ship a broken layout, a clipped dropdown, or a page at the wrong URL. Playwright scripts can assert modal.isVisible() without knowing the modal is rendered off-screen.

Essentially, coding agents need “eyes” to verify their own UI work.

frontend-visualqa is a CLI + MCP server for Claude Code and Codex for visual testing, verification, and QA of a website.

You give it a URL and natural-language claims:

  frontend-visualqa verify http://localhost:8000/dashboard.html \
  --claims \
  'The API status indicator shows Active' \
  'The monthly quota progress bar is completely filled'

  # → first claim passes, second fails (label says 100% but bar is ~65% full)

It catches visual<->DOM disagreements that selectors are blind to.

You can also test interactive flows without hardcoded data:

  frontend-visualqa verify 'http://localhost:8000/booking_form.html' \
  --claims 'The date on the confirmation page matches the date selected on the calendar' \
  --navigation-hint "Fill out the form with example data"

  # → fails: fills the form, picks a date, books the slot, and catches an off-by-one date error on the confirmation page

The visual evaluation runs on n1, a VLM by Yutori that is post-trained specifically for browser interaction with RL on live websites. It navigates pages autonomously — so when a coding agent sends it to the wrong URL, n1 sees the wrong page, self-corrects, and reports this correction. On browser-use benchmarks n1 slightly outperforms Opus 4.6 and GPT-5.4 while running 2—3x faster at 4—5x lower cost: https://yutori.com/blog/introducing-n1

How does this compare to?

1. Playwright CLI+MCP - Gold standard, but blind. - frontend-visualqa is the visual verification layer on top.

2. OpenAI Playwright skill / Claude + Dev-Browser - similar idea, but n1 is specifically trained for browser use (thus faster and cheaper), and the claim-based approach structures what to check rather than hoping the model notices everything. - Not locked to a TUI or IDE.

Known limitations: - Native <select> dropdowns render as OS-level widgets outside the viewport — n1 can't see or interact with them. Custom dropdowns work fine. - Small visual/numeric disagreements (red vs green status dot) are a known hard case. Improving with model updates.

Requires a Yutori API key (new accounts get free credits). DM me if you run out of credits.

Six People. 236 Episodes. 1 Coffee Shop. – Friends visualised

A graph of Trump's contradictions that builds itself using a local LLM

Dextr – Deterministic 100k Process Scheduler (Rust, No_std)

Anthropic Claude Mythos: The More Capable AI Becomes, the More Security It Needs

Impeaching Donald J. Trump, President of the United States [pdf]

Apple approves drivers that let AMD and Nvidia eGPUs run on Mac

FDIC Lays Out Guidelines for Institutions Issuing Stablecoins

Apple Is Reportedly Facing a 'Massive Dilemma' with the MacBook Neo

It's possible US use EMP in Iran

Boost.Container: comparing different deque implementations

Linux kernel maintainers are following through on removing Intel 486 support

Anthropic Lets Apple, Amazon Test More Powerful Mythos AI Model

SEC Awards Whistleblower over $50M for Enforcement Tip

Cybersecurity in the Age of Instant Software

Callgraph.io – Visualize the code flow of complex databases and system softwares

Show HN: Turn your GitHub activity into a weekly dev blog on GitHub Pages

I Built Multi-Agent Collaboration Before Agent Teams Existed

Durable Researcher

ICE arrested more than 800 people after tips from US airport security agency

Gitmore – AI summaries of your GitHub/GitLab activity (no more manual reports)

Anthropic Set to Preview Powerful 'Mythos' Model to Ward Off AI Cyberthreats

gitsugi; Mend the gaps in your GitHub contribution graph with gold

"Inference Noise", AI slop's older brother

System Card: Claude Mythos Preview [pdf]

Show HN: I turned the Pong Wars simulation into a multiplayer game

CIA used "long-range quantum magnetometry" called "Ghost Murmur" in Iran

First criticality for Indian fast breeder reactor

One async Rust codebase for STM32, Linux and the browser

Meet The Hero: Jane Elliott

Iranian-Affiliated Cyber Actors Exploit PLCs Across US Critical Infrastructure