frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

101x Airbyte, 11x Estuary, Postgres to Iceberg

5•pkhodiyar•5h ago
Hi HN, we've been developing OLake, an open-source connector specifically designed for replicating data from PostgreSQL into Apache Iceberg. We recently ran some detailed benchmarks comparing its performance and cost against several popular data movement tools: Fivetran, Debezium (using the memiiso setup mentioned), Estuary, and Airbyte.

We wanted to share the results, as they show OLake performing very competitively, often exceeding the speed of both open-source and commercial alternatives, while offering the cost advantages of a self-hosted open-source solution.

The benchmarks covered both full initial loads and Change Data Capture (CDC) on a large dataset (billions of rows for full load, tens of millions of changes for CDC) over a 24-hour window.

Link to entire benchmark postgres - https://olake.io/docs/connectors/postgres/benchmarks

For full loads, OLake achieved throughput of around 46,262 rows/sec, processing over 4 billion rows in 24 hours.

This was essentially on par with Fivetran (46,395 RPS) and significantly faster than Debezium (14,839 RPS - 3.1x slower), Estuary (3,982 RPS - 11.6x slower on a smaller processed dataset), and Airbyte (457 RPS - 101x slower before it failed the long test).

The most striking results were in CDC performance.

For processing 50 million changes, OLake completed the task in 22.5 minutes at 36,982 rows/sec. Fivetran took 31 minutes (1.4x slower), Debezium took 60 minutes (2.7x slower), Estuary took 4.5 hours (12x slower), and Airbyte took 23 hours (63x slower).

This indicates OLake delivers significantly lower latency for propagating changes from PostgreSQL to Iceberg.

On the cost side, OLake is open source and self-hosted. The cost is simply the infrastructure. Running the benchmarks on a substantial VM (64 vcpus, 128 GiB memory) for 24 hours cost less than $75.

Comparing this to the vendor list prices for the data synced in the tests: Fivetran's full load cost $7,446 ($1.86/M rows), Estuary's full load cost $4,462 ($12.97/M rows), Airbyte Cloud's partial full load cost $5,560 ($438.8/M rows).

For CDC, Fivetran cost $2,257 ($45.14/M rows), Estuary cost $22.72 ($0.45/M rows), and Airbyte Cloud cost $148.95 ($2.98/M rows).

While Estuary shows a low per-row cost for CDC in this specific test, the overall picture strongly favors the predictable, infra-based cost of self-hosted OLake, especially for large-scale replication.

In summary, these benchmarks suggest OLake can match or exceed the speed of leading proprietary tools for PostgreSQL to Iceberg replication, offers superior CDC latency compared to all tested alternatives, and provides a significantly lower and more predictable cost structure due to being open source and self-hosted.

You can find more details on the benchmarks and the tool itself in our documentation.

Happy to discuss the results and our approach.

Amazon says new warehouse robot can 'feel' items, but won't replace workers

https://www.cnbc.com/2025/05/07/meet-amazons-robot-vulcan-the-first-with-a-sense-of-touch.html
1•panrobo•2m ago•0 comments

Siri listened in on private conversations, Apple pays out $95M in lawsuit

https://www.theverge.com/news/663166/apple-siri-audio-recording-lawsuit-payout-applications
1•LinuxAmbulance•2m ago•1 comments

Kickidler employee monitoring software abused in ransomware attacks

https://www.bleepingcomputer.com/news/security/kickidler-employee-monitoring-software-abused-in-ransomware-attacks/
1•gloxkiqcza•4m ago•0 comments

Ask HN: How are you managing LLM inference at the edge?

1•gray_amps•5m ago•0 comments

We built an AI-powered voice tool to boost sales

1•Artjoker•5m ago•0 comments

In-Memory Ferroelectric Differentiator

https://www.nature.com/articles/s41467-025-58359-4
1•PaulHoule•6m ago•0 comments

Reservoir Sampling

https://samwho.dev/reservoir-sampling/
3•chrisdemarco•9m ago•0 comments

Letting Go of My Beloved Project After Getting Laid Off

1•Obiabo•9m ago•0 comments

Multiverse: The First AI Multiplayer World Model

https://github.com/EnigmaLabsAI/multiverse
1•EnigmaLabsAI•10m ago•0 comments

Show HN: Tree-walk interpreter (and formatter) written in C

https://github.com/cal31/vern-lang
2•cal31•10m ago•0 comments

Uber's Shower Gate Scandal

https://www.teamblind.com/post/ubers-shower-gate-scandal-KpqjYBkP
2•impish9208•12m ago•0 comments

Why developers and their bosses disagree over generative AI

https://leaddev.com/technical-direction/why-developers-and-their-bosses-disagree-over-generative-ai
1•gtirloni•12m ago•0 comments

Show HN: How I Lost 35kg and Built a Habit Tracker That Works

https://baransel.dev/post/days-without-habit-tracker-launch/
1•baransel•13m ago•0 comments

Say Goodbye to Library Late Fees: BookGenAI Creates Instant Bedtime Stories

1•flixaiorg•13m ago•0 comments

Identical Ancestors Point

https://en.wikipedia.org/wiki/Identical_ancestors_point
1•sand33pn•13m ago•0 comments

Where Are the Small Phones?

https://manualdousuario.net/en/where-are-the-small-phones/
2•rpgbr•15m ago•2 comments

Coinbase acquires crypto derivatives exchange Deribit for $2.9B

https://www.cnbc.com/2025/05/08/coinbase-acquires-crypto-derivatives-exchange-deribit-for-2point9-billion.html
1•cempaka•15m ago•0 comments

Will GenAI businesses crash and burn?

https://www.computerworld.com/article/3980239/will-genai-businesses-crash-and-burn.html
1•CrankyBear•16m ago•0 comments

Show HN: DiscoMonday – Real-time voice AI guides based on your location

https://www.discomonday.com/
1•discomonday•16m ago•0 comments

Molecular design of therapeutic LSD analogue w reduced hallucinogenic potential

https://www.pnas.org/doi/10.1073/pnas.2416106122
1•bookofjoe•16m ago•0 comments

Gemini 2.5 Models now support implicit caching

https://developers.googleblog.com/en/gemini-2-5-models-now-support-implicit-caching/
2•meetpateltech•17m ago•0 comments

Ask HN: Alternatives to Data Wrangler?

1•dwrodri•18m ago•0 comments

Ask HN: For people who've gone viral] which channels lead to virality?

1•ArisC•19m ago•0 comments

UnitedHealth sued by shareholders over its reaction to backlash from killing

https://www.reuters.com/sustainability/boards-policy-regulation/unitedhealth-sued-by-shareholders-over-its-reaction-backlash-executives-killing-2025-05-07/
2•iancmceachern•19m ago•0 comments

BookGenAI: Revolutionary Tool for Instant Bedtime Stories Generation

1•flixaiorg•20m ago•0 comments

Arcee AI AnyMCP

https://mcp.arcee.ai
1•julsimon•21m ago•1 comments

Show HN: Using eBPF to see through encryption without a proxy

https://github.com/qpoint-io/qtap
21•tylerflint•22m ago•2 comments

Can Discord replace your website (2023)

https://basicdev.blog/2023/11/21/can-discord-replace-your-website/
1•bitbasher•22m ago•0 comments

The inarguably best rhythm [video]

https://www.youtube.com/watch?v=dNnCUYgRRgk
1•inm•23m ago•0 comments

Show HN: I created an open source AI research assistant

https://www.youtube.com/watch?v=uyo3dfR9DDc
2•rohitghumare•23m ago•0 comments