frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Apache Iceberg vs. Databricks – benchmarked

https://olake.io/iceberg/databricks-vs-iceberg/
9•Cappybara12•2mo ago

Comments

Cappybara12•2mo ago
For every other data engineer or someone in higher hierarchy down the road comes to a choiuce of Apache Iceberg or Databricks Delta Lake, so we went ahead and benchmarked both systems. Just sharing our experience here.

TL;DR Both formats have their perks: Apache Iceberg offers an open, flexible architecture with surprisingly fast query performance in some cases, while Databricks Delta Lake provides a tightly managed, all-in-one experience where most of the operational overhead is handled for you.

Setup & Methodology

We used the TPC-H 1 TB dataset which is a dataset of about 8.66 billion rows across 8 tables to compare the two stacks end-to-end: ingestion and analytics.

For the Iceberg setup:

We ingested data from PostgreSQL into Apache Iceberg tables on S3, orchestrated through OLake’s high-throughput CDC pipeline using AWS Glue as catalog and EMR Spark for query.. Ingestion used 32 parallel threads with chunked, resumable snapshots, ensuring high throughput. On the query side, we tuned Spark similarly to Databricks (raised shuffle partitions to 128 and disabled vectorised reads due to Arrow buffer issues).

For the Databricks Delta Lake setup: Data was loaded via the JDBC connector from PostgreSQL into Delta tables in 200k-row batches. Databricks’ managed runtime automatically applied file compaction and optimized writes. Queries were run using the same 22 TPC-H analytics queries for a fair comparison.

This setup made sure we were comparing both ingestion performance and analytical query performance under realistic, production-style workloads.

What We Found

We used OLake to ingest to Iceberg and was about 2x faster - 12 hours vs 25.7 hours on Databricks thanks to parallel chunked ingestion.

Iceberg ran the full TPC-H suite 18% faster than Databricks.

Cost: Infra cost was 61% lower on Iceberg + OLake (around $21.95 vs $50.71 for the same run).

here are the overall result and our ideology on this-

Databricks still wins on ease-of-use: you just click and go. Cluster setup, Spark tuning, and governance are all handled automatically. That’s great for teams that want a managed ecosystem and don’t want to deal with infrastructure.

But if your team is comfortable managing a Glue/AWS stack and handling a bit more complexity, Iceberg + OLake’s open architecture wins on pure numbers faster at scale, lower cost, and full engine flexibility (Spark, Trino, Flink) without vendor lock-in.

read our article to know more on our steps followed and the overall benchmarks and the numbers around it curious to know what you people think ofcourse these are numbers but it largely depends on your experience too of how you adopted in your org

techandtry•2mo ago
Interesting write-up. What I’ve seen in practice is that benchmarks like these usually highlight the best-case behavior of Iceberg vs the default behavior of Delta. The real question ends up being how much manual tuning a team is willing to take on. Iceberg can absolutely outperform Delta if you’re intentional about ingestion parallelism or eve catalog setup.

But I’m also curious about the long-term operational side - how does the maintenance overhead evolve after a few months of production load?that maybe someone that have transitioned can answer

Scientists reverse Alzheimer's in mice and restore memory (2025)

https://www.sciencedaily.com/releases/2025/12/251224032354.htm
1•walterbell•1m ago•0 comments

Compiling Prolog to Forth [pdf]

https://vfxforth.com/flag/jfar/vol4/no4/article4.pdf
1•todsacerdoti•2m ago•0 comments

Show HN: Cymatica – an experimental, meditative audiovisual app

https://apps.apple.com/us/app/cymatica-sounds-visualizer/id6748863721
1•_august•4m ago•0 comments

GitBlack: Tracing America's Foundation

https://gitblack.vercel.app/
1•martialg•4m ago•0 comments

Horizon-LM: A RAM-Centric Architecture for LLM Training

https://arxiv.org/abs/2602.04816
1•chrsw•4m ago•0 comments

We just ordered shawarma and fries from Cursor [video]

https://www.youtube.com/shorts/WALQOiugbWc
1•jeffreyjin•5m ago•1 comments

Correctio

https://rhetoric.byu.edu/Figures/C/correctio.htm
1•grantpitt•5m ago•0 comments

Trying to make an Automated Ecologist: A first pass through the Biotime dataset

https://chillphysicsenjoyer.substack.com/p/trying-to-make-an-automated-ecologist
1•crescit_eundo•9m ago•0 comments

Watch Ukraine's Minigun-Firing, Drone-Hunting Turboprop in Action

https://www.twz.com/air/watch-ukraines-minigun-firing-drone-hunting-turboprop-in-action
1•breve•10m ago•0 comments

Free Trial: AI Interviewer

https://ai-interviewer.nuvoice.ai/
1•sijain2•10m ago•0 comments

FDA Intends to Take Action Against Non-FDA-Approved GLP-1 Drugs

https://www.fda.gov/news-events/press-announcements/fda-intends-take-action-against-non-fda-appro...
6•randycupertino•12m ago•1 comments

Supernote e-ink devices for writing like paper

https://supernote.eu/choose-your-product/
3•janandonly•14m ago•0 comments

We are QA Engineers now

https://serce.me/posts/2026-02-05-we-are-qa-engineers-now
1•SerCe•14m ago•0 comments

Show HN: Measuring how AI agent teams improve issue resolution on SWE-Verified

https://arxiv.org/abs/2602.01465
2•NBenkovich•14m ago•0 comments

Adversarial Reasoning: Multiagent World Models for Closing the Simulation Gap

https://www.latent.space/p/adversarial-reasoning
1•swyx•15m ago•0 comments

Show HN: Poddley.com – Follow people, not podcasts

https://poddley.com/guests/ana-kasparian/episodes
1•onesandofgrain•23m ago•0 comments

Layoffs Surge 118% in January – The Highest Since 2009

https://www.cnbc.com/2026/02/05/layoff-and-hiring-announcements-hit-their-worst-january-levels-si...
7•karakoram•23m ago•0 comments

Papyrus 114: Homer's Iliad

https://p114.homemade.systems/
1•mwenge•23m ago•1 comments

DicePit – Real-time multiplayer Knucklebones in the browser

https://dicepit.pages.dev/
1•r1z4•23m ago•1 comments

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

https://arxiv.org/abs/2601.14340
2•PaulHoule•25m ago•0 comments

Show HN: AI Agent Tool That Keeps You in the Loop

https://github.com/dshearer/misatay
2•dshearer•26m ago•0 comments

Why Every R Package Wrapping External Tools Needs a Sitrep() Function

https://drmowinckels.io/blog/2026/sitrep-functions/
1•todsacerdoti•27m ago•0 comments

Achieving Ultra-Fast AI Chat Widgets

https://www.cjroth.com/blog/2026-02-06-chat-widgets
1•thoughtfulchris•28m ago•0 comments

Show HN: Runtime Fence – Kill switch for AI agents

https://github.com/RunTimeAdmin/ai-agent-killswitch
1•ccie14019•31m ago•1 comments

Researchers surprised by the brain benefits of cannabis usage in adults over 40

https://nypost.com/2026/02/07/health/cannabis-may-benefit-aging-brains-study-finds/
2•SirLJ•33m ago•0 comments

Peter Thiel warns the Antichrist, apocalypse linked to the 'end of modernity'

https://fortune.com/2026/02/04/peter-thiel-antichrist-greta-thunberg-end-of-modernity-billionaires/
4•randycupertino•33m ago•2 comments

USS Preble Used Helios Laser to Zap Four Drones in Expanding Testing

https://www.twz.com/sea/uss-preble-used-helios-laser-to-zap-four-drones-in-expanding-testing
3•breve•39m ago•0 comments

Show HN: Animated beach scene, made with CSS

https://ahmed-machine.github.io/beach-scene/
1•ahmedoo•39m ago•0 comments

An update on unredacting select Epstein files – DBC12.pdf liberated

https://neosmart.net/blog/efta00400459-has-been-cracked-dbc12-pdf-liberated/
3•ks2048•39m ago•0 comments

Was going to share my work

1•hiddenarchitect•43m ago•0 comments