Problem I kept hitting: building agents is fast, but when something breaks, handing off “one failing run” is messy (screenshots, scattered logs, partial configs, access to a tracing UI, accidental secrets/PII in payloads).
What this does: run your agent on a case suite and generate a portable evidence pack you can open offline and attach to a GitHub issue/ticket:
report.html (offline viewer)
compare-report.json (machine-readable summary for CI gating: none | require_approval | block)
evidence files referenced via a manifest (so you can verify completeness/integrity)
It’s intentionally self-hosted/local-only: no backend, no accounts, nothing leaves your environment unless you export the pack.
Redaction note: in the “production” pipeline, redaction is applied in the runner before artifacts are written (the agent is not required to support a special header). There’s also a strict mode that scans all manifest-referenced files for residual markers as a safety gate.
I’m not trying to replace tracing/observability tools — this is meant to be the “handoff unit” when sharing a link or granting UI access isn’t viable.
Questions for HN:
If you’ve had to share a single failing run with another engineer/vendor, what was the missing piece that caused the most back-and-forth?
What would you consider “minimum viable contents” vs a “bundle monster”?