I’ve been working with BigQuery for ~5 years, mostly in large (petabyte-scale) environments. Over time we ended up spending a lot of money and engineering effort just trying to understand where costs were coming from, why and how to optimize them.
At some point we decided to stop, leverage all our past experience and spend a full cycle building tooling focused on cost visibility and optimization. The main goal was to regain ownership of cost data and make it possible to understand our cost structure in under a minute, while aligning the views of engineering and FinOps at the project level. To complement these, and given the recent rise of AI, we also built agents that review usage patterns and selected KPIs to surface issues and suggest concrete optimizations (always through an encoding layer for privacy).
This ended up working well for us: we reduced monthly BigQuery spend by ~43% and made cost considerations part of normal engineering workflows instead of a separate FinOps exercise. After validating it with a few other teams facing similar problems, we decided to ship it publicly in November.
Some transparency items to consider: - We only work with bigquery related information, not all gcs. - While we already support most sources of cost, the project is in constant development. If you use an uncommon feature it may not be covered, but we will implement it for you. - The larger the company the larger the impact, since there tends to be larger technical debt. Nonetheless, we've seen significant impact in lower scale startups. - We dont need labels. Labels are great, and should be taken care of, but we do not depend on them to provide insights.
Happy to share details, lessons learned, or tradeoffs if this is interesting to others dealing with large-scale data warehouse costs. Check us out at https://www.cloudclerk.ai/