Anyway, the old laptop is about par with the 'big' VMs that I use for work to analyse really big BQ datasets. My current flow is to do the kind of 0.001% queries that don't fit on a box on BigQuery and massage things with just enough prepping to make the intermediate result fit on a box. Then I extract that to parquet stored on the VM and do the analysis on the VM using DuckDB from python notebooks.
DuckDB has revolutionised not what I can do but how I can do it. All the ingredients were around before, but DuckDB brings it together and makes the ergonomics completely different. Life is so much easier with joins and things than trying to do the same in, say, pandas.
The role of a database is not just to deliver query performance. It needs to fit into the ecosystem, serve the overall role on multiple facets, deliver on a wide range of expectations - tech and non-tech.
While the useful dataset itself may not outpace the hardware advancements, the ecosystem complexity will definitely outpace any hardware or AI advancements. Overall adaptation to the ecosystem will dictate the database choice, not query performance. Technologies will not operate in isolation.
Back in 2012 we were just recovering from the everything-is-xml craze and in the middle of the no-sql craze and everything was web-scale and distribute-first micro-services etc.
And now, after all that mess, we have learned to love what came before: namely, please please please just give me sql! :D
Why do they use the geometric mean to average execution times?
I just did a quick google and first real result was this blog post with a good explanation with some good illustrations https://jlmc.medium.com/understanding-three-simple-statistic...
Its the very first illustration at the top of that blog post that 'clicks' for me. Hope it helps!
The inverse is also good: mean-square-error is the good way for comparing how similar two datasets (e.g. two images) are.
drewm1980•2h ago