Instead of classic mutation testing that does mechanical swaps (>→>=), spec-shaker uses an LLM to generate semantically broken implementations — realistic bugs like swallowed errors, missing side effects, off-by-one expiration boundaries, etc. It runs your test suite against each mutant and reports which mutants were killed vs survived. Survivors usually point to gaps in spec/assertions.
There’s a small demo (a URL shortener) that shows how survivors guide spec + test improvements across iterations.
I’d love feedback on: * whether “semantic mutants” are useful vs classic mutation testing * how you’d run this in CI (budgets, sampling, scoring)