- Unit/Integration testing is often dropped because of tight deadlines and low perceived business impact.
- It helps, but they're not a full solution for data quality.
- Many Data Engineers don't have a Software Engineering background, so common practices like testing often aren't applied.
- Creating multiple input/output tables with realistic synthetic data is complicated.
- Tooling and frameworks for testing data transformations are still pretty limited.
I'd like to hear Hacker News's perspective on this.
I also built an open-source toolkit/framework called Pybujia to help with some of these pains: https://github.com/jpgerek/pybujia/