I’m the creator of StageWright (and the open-source playwright-smart-reporter).
I’ve been frustrated by the "black box" nature of E2E test failures. Standard reporters tell you that a test failed, but they don't help you understand why it’s failing across 50 different runs or whether its execution time is trending toward a regression.
I built StageWright to treat test results as a performance and stability dataset.
Key Technical Features:
Historical Flakiness Detection: Unlike Playwright's default "retry" logic, we track failures across runs. A test only gets a high "Stability Grade" if it consistently passes over time.
Flamechart Step Timelines: We added a color-coded flamechart for test steps (v1.0.8). It categorizes steps into Navigation, Action, and API, making it easy to see if a 10s test is hanging on a locator or a slow backend response.
2-Sigma Anomaly Detection: The trends view uses moving averages and 2-sigma outlier detection to flag performance regressions that might otherwise go unnoticed.
AI-Powered Failure Clustering: We batch failures and use Claude/GPT-4 to cluster similar errors. Instead of 20 separate failures, you see "1 cluster: TimeoutError on payment-submit-btn."
Virtual Scroll Performance: We optimized the UI with virtual scrolling to handle suites with 500+ tests without the browser freezing—a common issue with the default HTML reporter.
Native Trace & Network Logs: Traces and network waterfalls are embedded directly in the report. No downloading .zip files from CI; they open instantly in an inline viewer.
The Architecture: StageWright is built to be "Playwright-native." It hooks into the reporter API and can run locally (outputting a standalone HTML/JSON history) or via our new Starter/Pro cloud tiers. The Pro tier provides a centralized dashboard for teams, long-term history retention, and cross-project analytics.
I’m currently supporting both Node.js and Python (pytest-playwright) environments.
I’d love to hear what the community thinks—especially regarding how you handle "test debt" in large CI pipelines. I'm here for any questions!