Every engineering team I've been part of has had the same problem. Playwright (or similar) scripts pass, yet critical bugs showed up when real users used our software. I don't think this is because Playwright is inherently bad - it is doing its job perfectly well - testing exactly what we told it to test.
The problem is - real world bugs happen because we didn't think to test a particular path - think a support widget covering the checkout button on smaller screen phones, or a race condition when clicking through buttons in a particular order. These kind of things need real humans to test software - but that is too expensive, slow and doesn't scale, especially with the pace of software development today.
With Bytesalt - you can describe what you want in plain English (for example- "Test the checkout flow on mobile") and Bytesalt fans out the work across parallel AI agents that simulate real users. Each agent explores the app with a different lens such as - functional QA, UX/Visual, usability, accessibility and security. Finally an agent collects all the results and produces a final bug report.
It has a web interface for humans as well as a CLI that can be integrated with CI/CD tools or used by coding agents.
You can try it on bytesalt.com (free tier - no credit card). A simple prompt, for example - "Test https://www.craigslist.org for usability on an iPhone. Check only above the fold. Report a single issue and stop. Do not explore."
Would love to hear any feedback. What would make this useful for your workflow? What's missing?