There are tons of tools that support data testing (love a dbt test), but even after writing robust tests we still ran into unexpected breaking changes. Most of the time, this became an issue when a product manager showed up wondering why some obscure metric was now null across the board. We didn't know that metric was being used (sometimes not even that it existed), so how would we have written a test for it? (A common cause was that an engineering team stopped recording a number because no one knew it was important; even when engineering did everything right, sent out comms "fyi we're removing the legacy user id values", and rolled out safely, no one expects that the marketing ROI dashboard depends on that.)
That's where Data Axolotl comes in: Data Axolotl is a CLI tool that catches unexpected breaking changes in analytics data sets, without you having to know in advance what might break.
Simply put, you point it at some data tables, run it daily, and it collects a ton of common metrics over time. (Column minimum, maximum, mean, row count, distinct count, etc.) If any of the metrics suddenly change, Data Axolotl generates an alert.
The whole thing runs from a python package, so you can run it locally or on your own infra via a scheduling tool like Airflow. You don't have to sign up for any new cloud services and risk data leaking data to a third party. History can be stored locally with sqlite (default), or in a remote db on your infra.
This is a pretty early release, so right now it only supports Snowflake tables, but we hope to add other database types in the future.
You can try out Data Axolotl today, directly from your local machine, by installing via pip. `pip install data-axolotl` A full setup guide can be found in the Readme.