frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Apache Iceberg vs. Databricks – benchmarked

https://olake.io/iceberg/databricks-vs-iceberg/
9•Cappybara12•1h ago

Comments

Cappybara12•1h ago
For every other data engineer or someone in higher hierarchy down the road comes to a choiuce of Apache Iceberg or Databricks Delta Lake, so we went ahead and benchmarked both systems. Just sharing our experience here.

TL;DR Both formats have their perks: Apache Iceberg offers an open, flexible architecture with surprisingly fast query performance in some cases, while Databricks Delta Lake provides a tightly managed, all-in-one experience where most of the operational overhead is handled for you.

Setup & Methodology

We used the TPC-H 1 TB dataset which is a dataset of about 8.66 billion rows across 8 tables to compare the two stacks end-to-end: ingestion and analytics.

For the Iceberg setup:

We ingested data from PostgreSQL into Apache Iceberg tables on S3, orchestrated through OLake’s high-throughput CDC pipeline using AWS Glue as catalog and EMR Spark for query.. Ingestion used 32 parallel threads with chunked, resumable snapshots, ensuring high throughput. On the query side, we tuned Spark similarly to Databricks (raised shuffle partitions to 128 and disabled vectorised reads due to Arrow buffer issues).

For the Databricks Delta Lake setup: Data was loaded via the JDBC connector from PostgreSQL into Delta tables in 200k-row batches. Databricks’ managed runtime automatically applied file compaction and optimized writes. Queries were run using the same 22 TPC-H analytics queries for a fair comparison.

This setup made sure we were comparing both ingestion performance and analytical query performance under realistic, production-style workloads.

What We Found

We used OLake to ingest to Iceberg and was about 2x faster - 12 hours vs 25.7 hours on Databricks thanks to parallel chunked ingestion.

Iceberg ran the full TPC-H suite 18% faster than Databricks.

Cost: Infra cost was 61% lower on Iceberg + OLake (around $21.95 vs $50.71 for the same run).

here are the overall result and our ideology on this-

Databricks still wins on ease-of-use: you just click and go. Cluster setup, Spark tuning, and governance are all handled automatically. That’s great for teams that want a managed ecosystem and don’t want to deal with infrastructure.

But if your team is comfortable managing a Glue/AWS stack and handling a bit more complexity, Iceberg + OLake’s open architecture wins on pure numbers faster at scale, lower cost, and full engine flexibility (Spark, Trino, Flink) without vendor lock-in.

read our article to know more on our steps followed and the overall benchmarks and the numbers around it curious to know what you people think ofcourse these are numbers but it largely depends on your experience too of how you adopted in your org

Sunrise Robotics

https://sunriserobotics.co/
1•otobrglez•6m ago•0 comments

Saab invests in space technology company Pythom

https://www.saab.com/newsroom/press-releases/2025/saab-invests-in-space-technology-company-pythom
1•madspindel•8m ago•0 comments

When high availability brings downtime

https://medium.com/learnings-from-the-paas/when-high-availability-brings-downtime-7a6261b0ef1c
1•todsacerdoti•8m ago•0 comments

Tell HN: Cursor exposes side projects to your employer

1•throwawaybbbbbb•8m ago•0 comments

Show HN: I built a simple website directory and it just passed 400 submissions

https://www.showmysites.com/
1•toutoulliou•9m ago•0 comments

IC3PEAK – Dead but Pretty [video]

https://www.youtube.com/watch?v=qCljI3cIObU
1•consumer451•11m ago•1 comments

A native, static binary with SQLite support in C#

https://pileofhacks.dev/post/a-native-static-binary-with-sqlite-support-in-c/
1•colonCapitalDee•14m ago•0 comments

Asymptotically optimal approximate Hadamard matrices

https://arxiv.org/abs/2511.14653
2•mathfan•24m ago•0 comments

Gov. Spencer Cox announces major nuclear energy hub in Utah

https://www.deseret.com/utah/2025/11/17/gov-cox-announces-site-for-utah-nuclear-power-plant/
1•mpweiher•24m ago•0 comments

The Convergence

https://rodgercuddington.substack.com/p/the-convergence
2•freespirt•32m ago•0 comments

Ask HN: What was it like for you to be sunned?

1•suckow•32m ago•0 comments

Nuclear power will receive most money from DOE loans

https://www.cnbc.com/2025/11/10/nuclear-power-energy-department-chris-wright-loan-westinghouse-ai...
2•mpweiher•36m ago•0 comments

How to disable Cloudflare proxying when you can't access the dashboard

https://www.coryzue.com/writing/cloudflare-dns/
2•czue•45m ago•0 comments

A free AI tool to generate custom reviews (any tone/length) in seconds

https://www.reviewsgenerator.org/
1•YarkYao•50m ago•1 comments

Exploring the Limits of Large Language Models as Quant Traders

https://nof1.ai/blog/TechPost1
16•rzk•54m ago•3 comments

Why do some industries naturally collapse into duopolies?

https://capital-folly.ghost.io/ghost/#/site
1•d_e_solomon•55m ago•1 comments

Pipedream to Be Acquired by Workday

https://pipedream.com/blog/pipedream-to-be-acquired-by-workday/
1•todsacerdoti•56m ago•0 comments

Kroyer Films

https://en.wikipedia.org/wiki/Kroyer_Films
1•exvi•56m ago•0 comments

Show HN: Slopper: Private AI Replies

https://play.google.com/store/apps/details?id=com.indrek.slopper&hl=en_US
1•indest•57m ago•0 comments

Show HN: Codesprint – A LeetCode Typing Trainer

https://github.com/cwklurks/codesprint
2•cwkcwk•57m ago•1 comments

A 450 KB static site generator based on Markdown and Lua

https://log.schemescape.com/posts/static-site-generators/smallest-static-site-generator.html
1•birdculture•58m ago•0 comments

Show HN: SimplyToast – A simple Linux startup and background process manager

https://github.com/toast1599/SimplyToast
1•toast1599•1h ago•0 comments

Ultra-Processed Foods and Human Health

https://www.thelancet.com/series-do/ultra-processed-food
4•sdeframond•1h ago•0 comments

Zoox is now welcoming its first public riders in San Francisco

https://zoox.com/journal/zoox-robotaxi-in-san-francisco
2•ChrisArchitect•1h ago•0 comments

Paola Piseddu Nagni: UNA Solerte Front Officer

https://paolapiseddunagni1.substack.com/p/paola-piseddu-nagni-una-solerte-front
1•PaolaNagni•1h ago•0 comments

Linus Torvalds, Creator of Linux and Git, in Conversation with Dirk Hohndel [video]

https://www.youtube.com/watch?v=tWx769t1JKg
2•blufish•1h ago•0 comments

YouTube Transcript Getter – Get YouTube Transcripts at Scale

https://apify.com/johnvc/youtubetranscripts
1•johncole•1h ago•0 comments

Apache Iceberg vs. Databricks – benchmarked

https://olake.io/iceberg/databricks-vs-iceberg/
9•Cappybara12•1h ago•1 comments

What nicotine does to your brain

https://economist.com/science-and-technology/2025/09/12/what-nicotine-does-to-your-brain
2•runeks•1h ago•0 comments

Show HN: Codebox, a Provider of Remote Workspaces

https://github.com/davidebianchi03/codebox
1•davidebianchi03•1h ago•0 comments