ClickHouse vs PostgreSQL UPDATE performance comparison

https://clickhouse.com/blog/update-performance-clickhouse-vs-postgresql

106•truth_seeker•5mo ago

Comments

AdriaanvRossum•5mo ago

> TL;DR: PostgreSQL wraps every statement in a fully transactional context; ClickHouse doesn’t. That means these results aren’t a perfect measure of transaction performance. However, they’re still an interesting and relevant look at how each system handles update workloads under its native execution model.

ndriscoll•5mo ago

tl;dr Doing a bunch of synchronous writes and full table scans with extremely slow, high latency storage (for a dataset that easily fits in RAM) is slow.

samsk•5mo ago

My tl;dr is: Don't do huge updates using WHERE on fields _not_ in index, otherwise it will take days to finish and you get meaningless numbers in benchmarks...

hn-user-42•5mo ago

I learned that in a hard way. It was midnight, I had to make some urgent changes to the user table in the production database. We had about 4 million users and needed to make changes to 70k users (due a change in 3rd party).

I figured out after 40 minutes that we are not making any progress. For each user we are probably searching 4 million users. After I added the index to a specific field, it was done quickly.

immibis•5mo ago

Oddly enough, for a big enough update, postgres sometimes will do the math and decide that scanning the whole table is faster than scanning the index and then scanning a certain fraction of the table (considering also that it has to rewrite the entire page if any row in the page is updated)

jedberg•5mo ago

> Caveat: PostgreSQL is fully transactional by default; ClickHouse isn’t. Results compare each engine’s native execution model, not identical transaction guarantees.

I appreciate that they call this out. This means that if you're using Postgres primarily for storing data where it's ok if you lose a few (like time series data) then Clickhouse is the clear and obvious choice.

But if the transactionality is key for you (like a financial application or the data for an application) then Postgres still makes more sense.

compumike•5mo ago

PostgreSQL defaults to: "fsync = ON" and "synchronous_commit = ON".

ClickHouse defaults to: "fsync_after_insert = false".

A more fair comparison might at least do "SET synchronous_commit TO OFF" on the PostgreSQL side. (But for this UPDATE benchmark, I think the results would still largely be the same.)

oulipo•5mo ago

Since Postgres is open-source and the kind of database model they use is out there for so long, how come ClickHouse can't have a "postgres subset" (or just include a mini-Postgres) so that it gets the best of both worlds out of the box?

primitivesuave•5mo ago

Fundamentally, Postgres is storing rows in 8kb pages of tuples, where each tuple has transaction metadata like `xmin` and `xmax` which are used in MVCC. This means a transactional update is usually only updating one block of memory + any TOASTable columns. Meanwhile, ClickHouse is storing data by columns, so a transactional update on multiple fields of a row necessarily involves updating multiple memory pages.

Personally, I architect systems to use Postgres for the write-heavy parts of a workload, and ClickHouse for the write-once parts (time series, analytics, logs, etc). ClickHouse is also the best tool I've ever found for compressing enormous datasets, and is very useful even as a simple data warehouse.

tehlike•5mo ago

We started seeing some developments there, using hybrid approaches like postgres+parquet or iceberg, or citus columnar

DrBenCarson•5mo ago

Does “data for application” means the operational db?

brewdad•5mo ago

Do you need every record you add/update to the database to be there when you try to read it later or are you ok with a best effort to save that works to some 9x.xxxxx degree but occasionally drops some things?

If it needs to be there, use a a fully transactional database.

jedberg•5mo ago

It could mean that, or it could mean your user database or where you store your blog comments or your customer's todo lists. Basically anything where you want to make sure all the data is still there the next time you go looking for it.

sdairs•5mo ago

One of the post authors here; I just want to stress that the goal of the post was not at all to be "haha we're better than postgres" - we explicitly call out the caveats behind the results.

Prior to July 2025 (ClickHouse v25.7), ClickHouse did not support UPDATE statements. At all. Like most columnar, analytics databases. We spent a lot of effort designing and implementing a way to support high-performance, SQL-standard UPDATE statements. This test was pretty much just trying to see how good of a job we had done, by comparing ourselves to the gold standard, Postgres. (If you're curious, we also wrote about how we built the UPDATE support in depth https://clickhouse.com/blog/updates-in-clickhouse-2-sql-styl...)

We have some updates to the post in progress; we originally deliberately used cold runs for both ClickHouse & Postgres, because we wanted to look at the "raw" update speed of the engine, vs. the variability of cache hits. But TL;DR when you run a more "real world" test where caches are warm and Postgres is getting very high cache-hit ratio, its point updates are consistently ~2ms, while ClickHouse is somewhere ~6ms (bulk updates are still many multiples faster in ClickHouse even with the cache in play).

samwillis•5mo ago

Really interesting post, and well done on setting up the Apples to Oranges nature of the benchmark, you're very clear. It's really interesting to see the deference the district architectures make.

Did you run any tests with the new transaction system in ClickHouse? It would be super interesting to see how it effected the batch updates.

sdairs•5mo ago

If you're refering to the experimental ACID transactions, we didn't test them; I'll check what state they're in and see if its worth including yet, if not, we'll for sure come back to them later and do a similar test!

mrlongroots•5mo ago

Another thought: this is not meant as criticism or anything, just an aspect of performance that I thought was interesting but not covered by the blog.

A test that would show PG's strengths over ClickHouse for OLTP would be a stress test with a long-running set of updates.

ClickHouse maintains updates as uncompacted patches merged in the background, which is how you would do it with a columnar store. But if you have an update-heavy workload, these patches would accumulate and your query performance would start to suffer. PG on the other hand completes all update work inline, and wouldn't get degrading performance under update-heavy regimes.

This is just a fundamental artifact of OLAP vs OLTP, maybe OLAP can be optimized to the point where it doesn't really matter for most workloads, but a theoretical edge remains with row-based stores and updates.

sdairs•5mo ago

I think we do allude this in the post, if my understanding is correct. AFAIK both ClickHouse and Postgres are using MVCC, so when an UPDATE is run, a new version of the data is written and the original data is not changed. In ClickHouse, we write a patch part, it's merged automatically by a background process or by FINAL. In Postgres, you write a new tuple and mark the old one as dead, which will be cleared up by a VACUUM.

I'd wager the overhead in Postgres is probably lighter (though it's also light in ClickHouse), so you're right, this would be an interesting test. We actually had something like that planned, to run concurrent UPDATEs and SELECTs at different volumes for a period of time, to see how they each cope. Will definitely do it!

fiddlerwoaroof•5mo ago

I haven’t read that post yet. I assume you merge the updates into the main data files when you merge the segments of the log-structured merge tree?

sdairs•5mo ago

Yeah that's right. We create "patch parts" that contain the updates, and merge them along with the usual merge process.

mdaniel•5mo ago

I see the Jepsen for Keeper (and, my sincere gratitude for it!); have you considered Jepsen for CH itself?

joshstrange•5mo ago

I only use CH for work so I'll read about this more on Monday but I shudder to think of the caveats. We have used cancelling rows and now the one of the merge engines that just needs a version (higher cancels out lower). No database has ever driven me more mad than Clickhouse. If your workload is append-only/insert-only then congrats, it's amazing, you'll have a great time. If you need to update data... Well, strap in.

As long as you can get away with Postgres, stay with Postgres. I'm sure this update here is a step forward just like version-merging is much better than cancelling rows but it's always got a ton of downsides.

Unrelated to updating data, the CH defaults drive me insane, the null join behavior alone made me reconsider trying to rip CH out of our infrastructure (after wasting too long trying to figure out why my query "wasn't working").

Lastly I'll say, if CH does what you need and you are comfortable learning all the ends and outs, then it can do some really cool things. But it's important to remember it's NOT a normal RDMS nor can you use it like one. I almost wish they didn't use SQL as the query language, then people would think about it differently, myself included.

barrkel•5mo ago

CH is better for analytics, where append only is the normal mode of operation, but I've used it in the past as an index. Store a copy of data in Clickhouse and use its vectorized columnar operations for ad hoc queries (the kind where indexes don't help because the user may query by any field they like). This can work well if your data is append-mostly and you do a rebuild yourself avter a while, but the way it sounds, Clickhouse is making it possible to get that to work well with a higher ratio of updates.

Either way, CH shouldn't be the store of truth when you need record level fidelity.

saisrirampur•5mo ago

Very interesting take — I see where you’re coming from. Yes, there are caveats and differences between ClickHouse and Postgres. Much of this stems from the nature of the workloads they are built for: Postgres for OLTP and ClickHouse for OLAP.

We’ve been doing our best to address and clarify these differences, whether through product features like this one or by publishing content to educate users. For example: https://clickhouse.com/blog/postgres-to-clickhouse-data-mode... https://www.youtube.com/watch?v=9ipwqfuBEbc.

From what we’ve observed, the learning curve typically ranges from a few weeks for smaller to medium migrations to 1–2 months for larger ones moving real-time OLAP workloads from Postgres to ClickHouse. Still, customers are making the switch and finding value — hundreds (or more) are using both technologies together to scale their real-time applications: Postgres for low-latency, high-throughput transactions and ClickHouse for blazing-fast (100x faster) analytics.

We’re actively working to bridge the gap between the two systems, with features like faster UPDATEs, enhanced JOINs and more. That’s why I’m not sure your comment is fully generalizable — the differences largely stem from the distinct workloads they support, and we’re making steady progress in narrowing that gap.

- Sai from the ClickHouse team here.

oulipo•5mo ago

What would be the best Postgres + CH setup to combine both? somethign using CDC and apply them to CH?

saisrirampur•5mo ago

Great question, exactly CDC from Postgres to ClickHouse and adapting the application to start using ClickHouse for analytics. Through the PeerDB acquisition, ClickHouse now has native CDC capabilities that work at any scale (few 10s of GB to 10s of TB Postgres databases). You can use ClickPipes if you’re on ClickHouse Cloud, or PeerDB if you’re using ClickHouse OSS.

Sharing a few links for reference: https://clickhouse.com/docs/integrations/clickpipes/postgres https://github.com/PeerDB-io/peerdb https://clickhouse.com/cloud/clickpipes/postgres-cdc-connect... https://clickhouse.com/blog/clickhouse-acquires-peerdb-to-bo...

Here is a short demo/talk that we did at our annual conferemce Open House that talks about this reference architecture https://clickhouse.com/videos/postgres-and-clickhouse-the-de...

smarx007•5mo ago

How much of the ISO/IEC 9075:2023 SQL standard does CH conform to?

YZF•5mo ago

You might want to think about converting your updates into some sort of event sourcing scheme where you insert new rows and then do aggregation. That pattern would be more appropriate for ClickHouse.

If you are needing updates then perhaps ClickHouse isn't the perfect choice. Something like ScyllaDB might be a better compromise if you want performant updates with (some sort of) consistency guarantees. If you need stronger guarantees you will need a "proper" database but then you're unlikely to get the performance. AKA tradeoffs or no free lunch.

mrits•5mo ago

I ported an entire analytic solution from SQL Server to clickhouse in a few months. While the workarounds for updates aren't great it didn't come as a surprise since I've used other similar databases. The joining/null behavior is called out in the documentation so that wasn't a surprise either.

CH has been my favorite database since I discovered PostgreSQL 20 years ago. My view point is don't use postgres unless you can't use CH.

kmac_•5mo ago

I can recommend Vertica: SQL, columnar storage, S3 backed, great extensibility, I could keep going. After several years of working with it, I can say it's my favorite OLAP DB that can be as fast as a transactional DB when handled correctly.

mdaniel•5mo ago

Confusingly they have a Community License <https://docs.vertica.com/24.4.x/en/getting-started/community...> but their actual things in GH carry an Apache 2 license <https://github.com/vertica/vertica-containers/blob/main/one-...> so I guess you're free to contribute to their getting-started files, but it's a binary license for the product

djfobbz•5mo ago

We've been using ClickHouse ReplacingMergeTree tables for updates without any issues...in fact, they've been more than reliable for our use case. For us, as long as updated data is visible within 15–30 minutes, that's acceptable. What's your ingest vs. update volume per hour and per minute?

jauntywundrkind•5mo ago

There's also the new CoalescingMergeTree, that seems very useful for many classic roll-up problems, ideal for materializing a recent view of the append only log of data that is ClickHouse's natural append-only log strong point. https://clickhouse.com/blog/clickhouse-25-6-coalescingmerget... https://news.ycombinator.com/item?id=44656436

For general mutable data, ClickHouse is trying super hard to get much better & doing amazing engineering. But it feels like it'll be a long time before the fortress of Postgres for OLTP is breached. https://about.gitlab.com/blog/two-sizes-fit-most-postgresql-... https://news.ycombinator.com/item?id=44895954

The top submission is the end of a 4 part series. Part two is really nice on the details of how ClickHouse has focused on speeding updates: recommend a read! https://clickhouse.com/blog/updates-in-clickhouse-2-sql-styl...

djfobbz•5mo ago

I agree, I’ve been on CH since v20 and I thought I was the only one who noticed that they’ve been working very hard to bridge the gap between OLAP and OLTP. Sure, they’ll always be first class OLAP DB…but if you know how to get dangerous with its strengths, making it the goto datalake for your existing OLTP is pretty freaking awesome. Thanks for those shares

mrj•5mo ago

Funny, I had the exact same frustration, also with nulls and a left join. I did end up ripping it out and doing it over again with Timescale (ugh okay Tiger Data). The ability to use Postgres normal things plus timeseries columar storage is really cool. I don't have big data though, just big enough where some tables got slow enough to worry about such things and not big enough to stomach basic sql not working.

franckpachot•5mo ago

What is the null join behavior that cause you problem?

trollied•5mo ago

4000 times faster is hardly difficult if you’re not trying to be ACID compliant. I can write to a flat file that doesn’t give a shit about integrity very quickly - it’s not a boast.

orf•5mo ago

Interesting read, but I feel like they should have also benchmarked using COPY with Postgres. This should be far faster than a bulk insert, and it’s more in line with what they are benchmarking.

The omission feels… odd.

sdairs•5mo ago

To be honest, I just didn't think of it. But thanks for the suggestion, we'll give it a go!

atombender•5mo ago

I love ClickHouse and use it heavily. One area where I wish CH could do better was as a primary data store.

Basically, every proposed use case for CH is based on event sourcing some data, from Postgres or logs or whatever. The implication is that the data either already exist as a "source of truth" in some primary ACID database, or at least there is an archive of raw data files, or maybe (as with logs and metrics) the risk of data loss isn't that big of a deal.

But what if you actually want to store the data in a single place? CH doesn't really offer peace of mind here. Its entire architecture is based on best-effort management of data. One of ClickHouse's best features is that it can store the data in cloud storage, to allow separation of data and compute at an incredible price point. But it can lose data.

So if you have, say, 30TB of data that is very columnar and cannot be effficiently queried in Postgres, you cannot simply store it in CH alone. You'd have to pay quite a lot of $ to have it safely guarded by (let's say) Postgres, even if it's not being used as the main source of queries. If you have heavy ingest rates, you're going to have to pay for more expensive SSD storage, too.

There are columnar databases that are ACID and focus on consistency, like TimescaleDB. But they tend to be cloud databases. For example, you can self-host Timescale, but you don't get access to the tiered cloud storage layer. So when self-hosting, you need to run expensive SSDs again, no separating of compute and data.

If CH has a better consistency story, or maybe a clustering story that ensures redundancy, I would be really inclined to use it as a primary store.

adren123•5mo ago

I wonder how ClickHouse would compare against DuckDB given that the latter usually performs very well on OLAP benchmarks such as TPC-H

leentee•5mo ago

I think the new UPDATE implementation is great, however its impact is limited. As a heavy CH user, our typical use case is inserting rows with some columns are missing, then backfill these columns later on (and I've seen such pattern in many places). In my previous company (big Chinese tech), we developed a customized solution for ingesting partial columns (that's long time ago when FINAL is still a pita). Our current solution is using aggregating merge tree with anyLast and nullable columns. UPDATE == INSERT and that's it. Imagine doing it with millions of UPDATE queries, that'll be nightmare.

Jon Stewart – One of My Favorite People – What Now? With Trevor Noah Podcast [video]

P2P crypto exchange development company

Vocal Guide – belt sing without killing yourself

Write for Your Readers Even If They Are Agents

Knowledge-Creating LLMs

Maple Mono: Smooth your coding flow

Sid Meier's System for Real-Time Music Composition and Synthesis

Show HN: Slop News – HN front page now, but it's all slop

Show HN: Empusa – Visual debugger to catch and resume AI agent retry loops

Show HN: Bitcoin wallet on NXP SE050 secure element, Tor-only open source

White House Explores Opening Antitrust Probe on Homebuilders

Show HN: MindDraft – AI task app with smart actions and auto expense tracking

How do you estimate AI app development costs accurately?

Going Through Snowden Documents, Part 5

Show HN: MCP Server for TradeStation

Canada unveils auto industry plan in latest pivot away from US

The essential Reinhold Niebuhr: selected essays and addresses

Rentahuman.ai Turns Humans into On-Demand Labor for AI Agents

StovexGlobal – Compliance Gaps to Note

Show HN: Afelyon – Turns Jira tickets into production-ready PRs (multi-repo)

Trump says America should move on from Epstein – it may not be that easy

Tiny Clippy – A native Office Assistant built in Rust and egui

LegalArgumentException: From Courtrooms to Clojure – Sen [video]

US moves to deport 5-year-old detained in Minnesota

If you lose your passport in Austria, head for McDonald's Golden Arches

Show HN: Mermaid Formatter – CLI and library to auto-format Mermaid diagrams

RFCs vs. READMEs: The Evolution of Protocols

Kanchipuram Saris and Thinking Machines

Chinese chemical supplier causes global baby formula recall

I've used AI to write 100% of my code for a year as an engineer

Jon Stewart – One of My Favorite People – What Now? With Trevor Noah Podcast [video]

P2P crypto exchange development company

Vocal Guide – belt sing without killing yourself

Write for Your Readers Even If They Are Agents

Knowledge-Creating LLMs

Maple Mono: Smooth your coding flow

Sid Meier's System for Real-Time Music Composition and Synthesis

Show HN: Slop News – HN front page now, but it's all slop

Show HN: Empusa – Visual debugger to catch and resume AI agent retry loops

Show HN: Bitcoin wallet on NXP SE050 secure element, Tor-only open source

White House Explores Opening Antitrust Probe on Homebuilders

Show HN: MindDraft – AI task app with smart actions and auto expense tracking

How do you estimate AI app development costs accurately?

Going Through Snowden Documents, Part 5

Show HN: MCP Server for TradeStation

Canada unveils auto industry plan in latest pivot away from US

The essential Reinhold Niebuhr: selected essays and addresses

Rentahuman.ai Turns Humans into On-Demand Labor for AI Agents

StovexGlobal – Compliance Gaps to Note

Show HN: Afelyon – Turns Jira tickets into production-ready PRs (multi-repo)

Trump says America should move on from Epstein – it may not be that easy

Tiny Clippy – A native Office Assistant built in Rust and egui

LegalArgumentException: From Courtrooms to Clojure – Sen [video]

US moves to deport 5-year-old detained in Minnesota

If you lose your passport in Austria, head for McDonald's Golden Arches

Show HN: Mermaid Formatter – CLI and library to auto-format Mermaid diagrams

RFCs vs. READMEs: The Evolution of Protocols

Kanchipuram Saris and Thinking Machines

Chinese chemical supplier causes global baby formula recall

I've used AI to write 100% of my code for a year as an engineer

ClickHouse vs PostgreSQL UPDATE performance comparison

Comments