I have these running in a CI/CD process, compare to previous commit. Results uploaded to R2. Few problems:
- Playwright regularly fails by timeout. This is flaky and go figure out what went wrong.
- You can do a matrix test (chrome/firefox/etc.) (mobile/tablet/etc.) but the problem is, you'll need to run these tests in parallel. The bare functional minimum is 16Gb vps with 4vcpu. For my test suite, it already take 20 minutes. If you want a larger matrix and have more pages, you'll be looking at a 64Gb with a dozen or so vpcus. That's hundreds of dollars a month...
- If you have an animation, it's a struggle to filter it out.
- From my knowledge, there is no "version slider" where you can go commit by commit and see how things changed.
- Playwright takes images and videos. These consumes a lot of data. Like Gbs of data for a few commits.
- Any of the managed solutions (like BrowserStack) costs hundreds of dollars.
Overall, I think it's great though a bit cumbersome to setup everything to work flawlessly and prevent from breaking every now and then. You can also do full flows (sigup-signin-do action-etc.. -> success/failure) which can test more than UI.
- I disagree that you need a powerful VPS to run these tests, we run our suite once a day at midnight instead of on every commit. You still get most of the benefit for much cheaper this way.
- We used BrowserStack initially but stopped due to flakiness. The key to getting a stable suite was to run tests against a local nginx image serving the web app and wiremock serving the API. This way you have short, predictable latency and can really isolate what you're trying to test.
Then how do you know which commit is responsible for the regression? I can see that working for a very small team where the amount of changes is limited but even so, especially with css, where a change in some place can affect the styles in another.
But I agree, if you have a large team or a large monorepo you probably want to know about breaking changes already at the PR stage.
How is it executed? Is it something build in into the Playwright, or there is missing part of the code presented, responsible for executing it?
Curiositry•4d ago
I frequently break my site in ways that aren't obvious. Right now, I use a combination of visualping and a homebrew tests.sh that hits various endpoints and runs checks, but I have been meaning to integrate screenshoting into my tests script (via selenium or cutycapt) rather than relying on a hosted service.
Have you found a good way of diffing the screenshots? DiffPDF works pretty well, but I haven't found a good solution for checking whether there are relevant changes automatically, rather than just has-changed, in a way that could be integrated into a script.
beingflo•4d ago
[0] https://playwright.dev/docs/trace-viewer-intro#opening-the-h...