frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

The Lost Decade of Small Data?

https://duckdb.org/2025/05/19/the-lost-decade-of-small-data.html
50•andreasha•2d ago

Comments

drewm1980•2h ago
I mean, not everyone spent their decade on distributed computing. Some devs with a retrogrouch inclination kept writing single threaded code in native languages on a single node. Single core clock speed stagnated, but it was still worth buying new CPU's with more cores because they also had more cache, and all the extra cores are useful for running ~other peoples' bloated code.
fulafel•1h ago
Related in the big-data-benchmarks-on-old-laptop department: https://www.frankmcsherry.org/graph/scalability/cost/2015/01...
willvarfar•1h ago
I only retired my 2014 MBP ... last week! It started transiently not booting and then, after just a few weeks, it switched to be only transiently booting. Figured it was time. My new laptop is actually a very budget buy, and not a mac, and in many things a bit slower than the old MBP.

Anyway, the old laptop is about par with the 'big' VMs that I use for work to analyse really big BQ datasets. My current flow is to do the kind of 0.001% queries that don't fit on a box on BigQuery and massage things with just enough prepping to make the intermediate result fit on a box. Then I extract that to parquet stored on the VM and do the analysis on the VM using DuckDB from python notebooks.

DuckDB has revolutionised not what I can do but how I can do it. All the ingredients were around before, but DuckDB brings it together and makes the ergonomics completely different. Life is so much easier with joins and things than trying to do the same in, say, pandas.

mediumsmart•53m ago
I am on the late 2015 version and I have an ebay body stashed for when the time comes to refurbish that small data machine.
zkmon•53m ago
A database is not only about disk size and query performance. Database reflects the company's culture, processes, workflows, collaboration etc. It has an entire ecosystem around it - master data, business processes, transactions, distributed applications, regulatory requirements, resiliency, Ops, reports, tooling etc,

The role of a database is not just to deliver query performance. It needs to fit into the ecosystem, serve the overall role on multiple facets, deliver on a wide range of expectations - tech and non-tech.

While the useful dataset itself may not outpace the hardware advancements, the ecosystem complexity will definitely outpace any hardware or AI advancements. Overall adaptation to the ecosystem will dictate the database choice, not query performance. Technologies will not operate in isolation.

zwnow•17m ago
No, a database reflects what you make out of it. Reports are just queries after all. I dont know what all the other stuff you named has to do with the database directly. The only purpose of databases is to store and read data, thats what it comes down to. So query performance IS one of the most important metrics.
willvarfar•11m ago
And its very much the tech culture at large that influences the company's tech choices. Those techies chasing shiny things and trying to shoehorn it into their job - perhaps cynically to pad their cvs or perhaps generously thinking it will actually be the right thing to do - have an outsized say in how tech teams think about tech and what they imagine their job is.

Back in 2012 we were just recovering from the everything-is-xml craze and in the middle of the no-sql craze and everything was web-scale and distribute-first micro-services etc.

And now, after all that mess, we have learned to love what came before: namely, please please please just give me sql! :D

querez•34m ago
> The geometric mean of the timings improved from 218 to 12, a ca. 20× improvement.

Why do they use the geometric mean to average execution times?

willvarfar•13m ago
Squaring is a really good way to make the common-but-small numbers have bigger representation than the outlying-but-large numbers.

I just did a quick google and first real result was this blog post with a good explanation with some good illustrations https://jlmc.medium.com/understanding-three-simple-statistic...

Its the very first illustration at the top of that blog post that 'clicks' for me. Hope it helps!

The inverse is also good: mean-square-error is the good way for comparing how similar two datasets (e.g. two images) are.

ayhanfuat•4m ago
It's a way of saying twice as fast and twice as slow have equal weights. If your baseline is 10 seconds, one benchmark takes 5 seconds, and another one takes 20 seconds then the geometric mean gives you 10 seconds as the result because they cancel each other. The arithmetic mean would treat it differently because in absolute terms 10 seconds slow down is bigger than 5 seconds speedup. But that is not fair for speedups because the absolute speedup you can reach is at most 10 seconds but slow down has no limits.

Gemini Diffusion

https://simonwillison.net/2025/May/21/gemini-diffusion/
398•mdp2021•5h ago•86 comments

Kotlin-Lsp: Kotlin Language Server and Plugin for Visual Studio Code

https://github.com/Kotlin/kotlin-lsp
65•todsacerdoti•4h ago•28 comments

Decibels Are Ridiculous

https://lcamtuf.substack.com/p/decibels-are-ridiculous
52•Ariarule•2h ago•11 comments

Inigo Quilez: computer graphics, mathematics, shaders, fractals, demoscene

https://iquilezles.org/articles/
52•federicoponzi•3d ago•5 comments

Getting a paper accepted

https://maxwellforbes.com/posts/how-to-get-a-paper-accepted/
103•stefanpie•5h ago•39 comments

For algorithms, a little memory outweighs a lot of time

https://www.quantamagazine.org/for-algorithms-a-little-memory-outweighs-a-lot-of-time-20250521/
241•makira•11h ago•63 comments

Devstral

https://mistral.ai/news/devstral
472•mfiguiere•16h ago•106 comments

The Lost Decade of Small Data?

https://duckdb.org/2025/05/19/the-lost-decade-of-small-data.html
50•andreasha•2d ago•10 comments

Gemini figured out my nephew’s name

https://blog.nawaz.org/posts/2025/May/gemini-figured-out-my-nephews-name/
99•BeetleB•3d ago•44 comments

Rocky Linux 10 Will Support RISC-V

https://rockylinux.org/news/rockylinux-support-for-riscv
128•fork-bomber•10h ago•59 comments

ITXPlus: A ITX Sized Macintosh Plus Logicboard Reproduction

https://68kmla.org/bb/index.php?threads/itxplus-a-itx-sized-macintosh-plus-logicboard-reproduction.49715/
77•zdw•9h ago•19 comments

CERN gears up to ship antimatter across Europe

https://arstechnica.com/science/2025/05/cern-gears-up-to-ship-antimatter-across-europe/
126•ben_w•2d ago•59 comments

Collaborative Text Editing Without CRDTs or OT

https://mattweidner.com/2025/05/21/text-without-crdts.html
221•samwillis•13h ago•61 comments

Direct TLS can speed up your connections

https://marc-bowes.com/postgres-direct-tls.html
6•tanelpoder•1h ago•1 comments

Show HN: Display any CSV file as a searchable, filterable, pretty HTML table

https://github.com/derekeder/csv-to-html-table
125•indigodaddy•6h ago•24 comments

OpenAI to buy AI startup from Jony Ive

https://www.bloomberg.com/news/articles/2025-05-21/openai-to-buy-apple-veteran-jony-ive-s-ai-device-startup-in-6-5-billion-deal
704•minimaxir•13h ago•935 comments

Animated Factorization (2012)

http://www.datapointed.net/visualizations/math/factorization/animated-diagrams/
245•miniBill•16h ago•55 comments

The curious tale of Bhutan's playable record postage stamps (2015)

https://thevinylfactory.com/features/the-curious-tale-of-bhutans-playable-record-postage-stamps/
102•ohjeez•12h ago•12 comments

Tales from Mainframe Modernization

https://oppi.li/posts/tales_from_mainframe_modernization/
48•todsacerdoti•6h ago•11 comments

LLM function calls don't scale; code orchestration is simpler, more effective

https://jngiam.bearblog.dev/mcp-large-data/
210•jngiam1•13h ago•78 comments

Possible new dwarf planet found in our solar system

https://www.minorplanetcenter.net/mpec/K25/K25K47.html
128•ddahlen•12h ago•82 comments

The Machine Stops (1909)

https://standardebooks.org/ebooks/e-m-forster/short-fiction/text/the-machine-stops
78•xeonmc•9h ago•18 comments

Sorcerer (YC S24) Is Hiring a Lead Hardware Design Engineer

https://jobs.ashbyhq.com/sorcerer/6beb70de-9956-49b7-8e28-f48ea39efac6
1•maxmclau•10h ago

Dijkstra on Ada

https://craftofcoding.wordpress.com/2014/04/16/dijkstra-on-ada/
28•cpeterso•6h ago•6 comments

An upgraded dev experience in Google AI Studio

https://developers.googleblog.com/en/google-ai-studio-native-code-generation-agentic-tools-upgrade/
138•meetpateltech•13h ago•83 comments

Show HN: Confidential computing for high-assurance RISC-V embedded systems

https://github.com/IBM/ACE-RISCV
84•mrnoone•10h ago•5 comments

Show HN: ClipJS – Edit your videos from a PC or phone

https://clipjs.vercel.app/
111•mohyware•11h ago•43 comments

Understanding the Go Scheduler

https://nghiant3223.github.io/2025/04/15/go-scheduler.html
143•gnabgib•3d ago•20 comments

ZEUS – A new two-petawatt laser facility at the University of Michigan

https://news.engin.umich.edu/2025/05/the-us-has-a-new-most-powerful-laser/
102•voxadam•15h ago•98 comments

Ancient reptile footprints are rewriting the history of when animals evolved

https://apnews.com/article/oldest-reptile-footprints-australia-963e3c38c8d5782e7ac20f5405f15f89
6•gmays•3d ago•0 comments