Key features: - Single API that works for both pandas and PySpark DataFrames - Minimal dependencies (won't bloat Docker images or slow builds) - Decorator-based validation for automatic function output checks - Tag-based filtering to run specific validations by environment or priority or any other criteria - Reusable expectation definitions across your codebase
The library is lightweight and integrates easily into existing CI/CD pipelines, helping you catch data quality issues before production.
Links: • PyPI: https://pypi.org/project/dataframe-expectations/
• GitHub: https://github.com/getyourguide/dataframe-expectations
• Docs: https://code.getyourguide.com/dataframe-expectations/
The project is still in its early stages and I'd love to hear your feedback and answer any questions!