Most ESP's use frequentist models. You need a fixed sample size calculated upfront, you can't peek at results early without inflating your false positive rate, and if your list isn't massive, you're waiting weeks for a result that often comes back inconclusive anyway. So teams either ignore the statistics entirely and just pick whichever variant "looks better" after a day, or they stop testing altogether. Either way, you learn nothing.
On top of that, the testing workflow itself is painful. You duplicate your entire template, change the one thing you want to test (a subject line, a hero image, a CTA), split your audience, send, and wait. Want to test two things at once? Now you need four template versions. Three things? Eight versions. It scales terribly, and most teams just don't bother.
Liftstack fixes both problems.
Instead of duplicating templates, you define test slots at individual content positions within a single message. Subject line, hero image, CTA copy, footer; each slot gets its own variants and its own statistical analysis, all running simultaneously without the combinatorial explosion.
And instead of frequentist significance testing, we use Bayesian analysis. You can check results at any point without statistical penalty. You get four clear verdicts: Winner Found, No Meaningful Difference, More Data Needed, or Guardrail Violation. No more agonising over p-values with undersized samples.
Here's what else makes it different:
* Revenue attribution, not vanity metrics. Results are expressed as "Variant B generated £14,200 additional revenue" with confidence ranges. Answer the business question directly. * A persistent snippet library that accumulates learnings across campaigns and channels. Your tenth test on a CTA style is dramatically more informative than your first. * Guardrail monitoring that catches unsubscribe spikes, spam complaints, and bounce rate increases before a "winning" variant tanks your list health. * Thompson Sampling that automatically shifts traffic toward better-performing variants while still exploring. * Multi-channel parity across email, push, SMS, in-app messages, and content cards. Same statistical rigour everywhere.
We integrate directly with Klaviyo, Customer.io, Iterable, and Braze for now, with more ESP's planned.
I built this because most CRM testing today is theatre; it looks scientific but the methodology is so crude you'd learn more from a coin flip. Liftstack treats content testing as a proper discipline with real statistical foundations.
Happy to answer questions about the approach, the stats, or anything else.