TL;DR: NVMe-backed Postgres + built-in CDC into ClickHouse + pg_clickhouse so you can keep your app Postgres-first while running analytics in ClickHouse.
Try it (private preview): https://clickhouse.com/cloud/postgres Blog w/ live demo: https://clickhouse.com/blog/postgres-managed-by-clickhouse
Problem
Across many fast-growing companies using Postgres, performance and scalability commonly emerge as challenges as they grow. This is for both transactional and analytical workloads. On the OLTP side, common issues include slower ingestion (especially updates, upserts), slower vacuums, long-running transactions incurring WAL spikes, among others. In most cases, these problems stem from limited disk IOPS and suboptimal disk latency. Without the need to provision or cap IOPS, Postgres could do far more than it does today.
On the analytics side, many limitations stem from the fact that Postgres was designed primarily for OLTP and lacks several features that analytical databases have developed over time, for example vectorized execution, support for a wide variety of ingest formats, etc. We’re increasingly seeing a common pattern where many companies like GitLab, Ramp, Cloudflare etc. complement Postgres with ClickHouse to offload analytics. This architecture enables teams to adopt two purpose-built open-source databases.
That said, if you’re running a Postgres based application, adopting ClickHouse isn’t straightforward. You typically end up building a CDC pipeline, handling backfills, and dealing with schema changes and updating your application code to be aware of a second database for analytics.
Solution
On the OLTP side, we believe that NVMe-based Postgres is the right fit and can drastically improve performance. NVMe storage is physically colocated with compute, enabling significantly lower disk latency and higher IOPS than network-attached storage, which requires a network round trip for disk access. This benefits disk-throttled workloads and can significantly (up to 10x) speed up operations incl. updates, upserts, vacuums, checkpointing, etc. We are working on a detailed blog examining how WAL fsyncs, buffer reads, and checkpoints dominate on slow I/O and are significantly reduced on NVMe. Stay tuned!
On the OLAP side, the Postgres service includes native CDC to ClickHouse and unified query capabilities through pg_clickhouse. Today, CDC is powered by ClickPipes/PeerDB under the hood, which is based on logical replication. We are working to make this faster and easier by supporting logical replication v2 for streaming in-progress transactions, a new logical decoding plugin to address existing limitations of logical replication, working toward sub-second replication, and more.
Every Postgres comes packaged with the pg_clickhouse extension, which reduces the effort required to add ClickHouse-powered analytics to a Postgres application. It allows you to query ClickHouse directly from Postgres, enabling Postgres for both transactions and analytics. pg_clickhouse supports comprehensive query pushdown for analytics, and we plan to continuously expand this further (https://news.ycombinator.com/item?id=46249462).
Vision
To sum it up - Our vision is to provide a unified data stack that combines Postgres for transactions with ClickHouse for analytics, giving you best-in-class performance and scalability on an open-source foundation.
Get Started
We are actively working with users to onboard them to the Postgres service. Since this is a private preview, it is currently free of cost.If you’re interested, please sign up here. https://clickhouse.com/cloud/postgres
We’d love to hear your feedback on our thesis and anything else that comes to mind, it would be super helpful to us as we build this out!
scottmas•54m ago
Will pricing likely just be a percent markup over the (excellent) Ubicloud prices they have listed? (https://www.ubicloud.com/docs/about/pricing)
saisrirampur•42m ago