I’ve been working on a small project called Kontra and just released it.
Kontra is a data quality measurement engine. You define rules in YAML or Python, run them against Parquet, CSV, or database tables, and get back violation counts and sampled failing rows.
The main goal was to avoid doing more work than necessary. Instead of treating all rules the same, Kontra separates execution paths. Some checks can be answered from Parquet metadata alone, others are pushed down to SQL, and full in-memory scans only happen for rules that actually need them. The guarantees differ, and Kontra is explicit about that rather than hiding it.
Under the hood it uses DuckDB for SQL pushdown on files and Polars for in-memory execution. It also supports profiling datasets, drafting starter rules from observed data, and diffing validation runs over time. Rules can carry user-defined context, and runs can be annotated after execution without affecting validation behavior.
It works as both a CLI and a Python library.
Happy to answer questions or get feedback.