I just released an open-source project on GitHub called "Urban Data Pipeline: A Hands-On Guide to Real-World Data Engineering", a practical project designed to simulate challenges you’d encounter building centralized data systems in the wild. If you're into urban development, traffic insights, or air quality monitoring this one's for you.
The project takes you through ingesting live data (like traffic, air quality, and urban zoning), cleaning and enriching it using Python, dbt, and Airflow, storing it in Snowflake, and visualizing everything through Tableau.
Everything’s containerized and orchestrated using Docker. You can use this pipeline to simulate how smart cities manage and analyze data or just learn how to connect multiple tools into a cohesive, modern data stack.
I started this project as a learning journey to centralize and analyze real-world urban data across Jakarta. It's been a great playground for experimenting, which make the modern data stack feel lighter and more approachable.
Here’s also my medium article if you want look into it: https://medium.com/data-science-collective/end-to-end-urban-...
Would love feedback, contributions, or thoughts on improve this. Cheers!