Show HN: PgDog – Scale Postgres without changing the app

109•levkk•4h ago

Hey HN! Lev and Justin here, authors of PgDog (https://pgdog.dev/), a connection pooler, load balancer and database sharder for PostgreSQL. If you build apps with a lot of traffic, you know the first thing to break is the database. We are solving this with a network proxy that works without requiring application code changes or database migrations.

Our post from last year: https://news.ycombinator.com/item?id=44099187

The most important update: we are in production. Sharding is used a lot, with direct-to-shard queries (one shard per query) working pretty much all the time. Cross-shard (or multi-database) queries are still a work in progress, but we are making headway.

Aggregate functions like count(), min(), max(), avg(), stddev() and variance() are working, without refactoring the app. PgDog calculates the aggregate in-transit, while transparently rewriting queries to fetch any missing info. For example, multi-database average calculation requires a total count of rows to calculate the original sum. PgDog will add count() to the query, if it’s not there already, and remove it from the rows sent to the app.

Sorting and grouping works, including DISTINCT, if the columns(s) are referenced in the result. Over 10 data types are supported, like, timestamp(tz), all integers, varchar, etc.

Cross-shard writes, including schema changes (CREATE/DROP/ALTER), are now atomic and synchronized between all shards with two-phase commit. PgDog keeps track of the transaction state internally and will rollback the transaction if the first phase fails. You don’t need to monkeypatch your ORM to use this: PgDog will intercept the COMMIT statement and execute PREPARE TRANSACTION and COMMIT PREPARED instead.

Omnisharded tables, a.k.a replicated or mirrored (identical on all shards), support atomic reads and writes. That’s important because most databases can’t be completely sharded and will have some common data on all databases that has to be kept in-sync.

Multi-tuple inserts, e.g., INSERT INTO table_x VALUES ($1, $2), ($3, $4), are split by our query rewriter and distributed to their respective shards automatically. They are used by ORMs like Prisma, Sequelize, and others, so those now work without code changes too.

Sharding keys can be mutated. PgDog will intercept and rewrite the update statement into 3 queries, SELECT, INSERT, and DELETE, moving the row between shards. If you’re using Citus (for everyone else, Citus is a Postgres extension for sharding databases), this might be worth a look.

If you’re like us and prefer integers to UUIDs for your primary keys, we built a cross-shard unique sequence, directly inside PgDog. It uses the system clock (and a couple other inputs), can be called like a Postgres function, and will automatically inject values into queries, so ORMs like ActiveRecord will continue to work out of the box. It’s monotonically increasing, just like a real Postgres sequence, and can generate up to 4 million numbers per second with a range of 69.73 years, so no need to migrate to UUIDv7 just yet.

    INSERT INTO my_table (id, created_at) VALUES (pgdog.unique_id(), now());

Resharding is now built-in. We can move gigabytes of tables per second, by parallelizing logical replication streams across replicas. This is really cool! Last time we tried this at Instacart, it took over two weeks to move 10 TB between two machines. Now, we can do this in just a few hours, in big part thanks to the work of the core team that added support for logical replication slots to streaming replicas in Postgres 16.

Sharding hardly works without a good load balancer. PgDog can monitor replicas and move write traffic to a promoted primary during a failover. This works with managed Postgres, like RDS (incl. Aurora), Azure Pg, GCP Cloud SQL, etc., because it just polls each instance with “SELECT pg_is_in_recovery()”. Primary election is not supported yet, so if you’re self-hosting with Patroni, you should keep it around for now, but you don’t need to run HAProxy in front of the DBs anymore.

The load balancer is getting pretty smart and can handle edge cases like SELECT FOR UPDATE and CTEs with INSERT/UPDATE statements, but if you still prefer to handle your read/write separation in code, you can do that too with manual routing. This works by giving PgDog a hint at runtime: a connection parameter (-c pgdog.role=primary), SET statement, or a query comment. If you have multiple connection pools in your app, you can replace them with just one connection to PgDog instead. For multi-threaded Python/Ruby/Go apps, this helps by reducing memory usage, I/O and context switching overhead.

Speaking of connection pooling, PgDog can automatically rollback unfinished transactions and drain and re-sync partially sent queries, all in an effort to preserve connections to the database. If you’ve seen Postgres go to 100% CPU because of a connection storm caused by an application crash, this might be for you. Draining connections works by receiving and discarding rows from abandoned queries and sending the Sync message via the Postgres wire protocol, which clears the query context and returns the connection to a normal state.

PgDog is open source and welcomes contributions and feedback in any form. As always, all features are configurable and can be turned off/on, so should you choose to give it a try, you can do so at your own pace. Our docs (https://docs.pgdog.dev) should help too.

Thanks for reading and happy hacking!

Comments

mijoharas•3h ago

Happy pgdog user here, I can recommend it from a user perspective as a connection pooler to anyone checking this out (we're also running tests and positive about sharding, but haven't run it in prod yet, so I can't 100% vouch for it on that, but that's where we're headed.)

@Lev, how is the 2pc coming along? I think it was pretty new when I last checked, and I haven't looked into it much since then. Is it feeling pretty solid now?

levkk•2h ago

It feels better now, but we still need to add crash protection - in case PgDog itself crashes, we need to restore in-progress 2pc transaction records from a durable medium. We will add this very soon.

cpursley•2h ago

Looks great - I'd love to include it in https://postgresisenough.dev (just put in a PR: https://github.com/agoodway/postgresisenough?tab=readme-ov-f...)

verdverm•2h ago

Why don't you just do it yourself if you maintain a curated resource list?

cpursley•1h ago

Wanted to give them chance to write it up as they like

aram99•2h ago

nebezb•2h ago

While the lift to add to your database is low, I don’t think you’re at a point you can outsource the work.

But all the better if they do!

pbreit•1h ago

How well does PG work with 10-20 million (financial) records per day? Basic stuff: a few writes per, some reads, generating some analytics, etc.

octoclaw•2h ago

The cross-shard aggregate rewriting is really nice. Transparently injecting count() for average calculations sounds straightforward but there are so many edge cases once you add GROUP BY, HAVING, subqueries, etc.

Curious about latency overhead for the common case. On a direct-to-shard read where no rewriting happens, what's the added latency from going through PgDog vs connecting to Postgres directly? Sub-millisecond?

levkk•2h ago

Subms typically, yeah. We measured the average latency between nodes in the same AZ (e.g., AWS availability zone) to be less than one ms, so you need to account for one extra hop and processing time by PgDog, which is typically fast.

That being said if you don't currently use a connection pooler, you will notice some latency when adding one. It's usually table stakes for Postgres at scale since you need one anyway, but it can be surprising. This especially affects "chatty" apps - the ones that send 10+ queries to service one API request, and makes bugs like N+1s considerably worse.

TLDR: not a free lunch, but generally acceptable at scale.

noleary•2h ago

> If you build apps with a lot of traffic, you know the first thing to break is the database.

Just out of curiosity, what kinds of high-traffic apps have been most interested in using PgDog? I see you guys have Coinbase and Ramp logos on your homepage -- seems like fintech is a fit?

levkk•1h ago

We have all kinds, it's not specific to any particular sector. That's kind of the beauty for building for Postgres - everyone uses it in some capacity!

My general advice is, once you see more than 100 connections on your database, you should consider adding a connection pooler. If your primary load exceeds 30% (CPU util), consider adding read replicas. This also applies if you want some kind of workload isolation between databases, e.g. slow/expensive analytics queries can be pushed to a replica. Vertically scaling primaries is also a fine choice, just keep that vertical limit in mind.

Once you're a couple instance types away from the largest machine your cloud provider has, start thinking about sharding.

mystifyingpoi•1h ago

> If your primary load exceeds 30% (CPU util), consider adding read replicas.

I'm not an expert, but isn't this excessive? In theory you could triple the load and still have slack. I'd actually try to scale down, not up.

CuriouslyC•34m ago

Load is highly bursty. You can autoscale application services quickly, but scaling your database up is a slower thing.

Eikon•39m ago

> Vertically scaling primaries is also a fine choice, just keep that vertical limit in mind.

In practice, outside of extremely marginal use cases, you’ll have a hard time exceeding such a limit. Merklemap [0] hosts a 100B-row, 40TB Postgres database on low-tier commodity hardware and things are more than fine; there are very few use cases where it would be an actual practical limit.

[0] https://www.merklemap.com/

jackfischer•1h ago

Congrats guys! Curious how the read write splitting is reliable in practice due to replication lag. Do you need to run the underlying cluster with synchronous replication?

levkk•1h ago

Not really, replication lag is generally an accepted trade-off. Sync replication is rarely worth it, since you take a 30% performance hit on commits and add more single points of failure.

We will add some replication lag-based routing soon. It will prioritize replicas with the lowest lag to maximize the chance of the query succeeding and remove replicas from the load balancer entirely if they have fallen far behind. Incidentally, removing query load helps them catch up, so this could be used as a "self-healing" mechanism.

jackfischer•1h ago

It sounds like this is one of the few places that might be a leaky abstraction in that queries _might_ fail and the failure might effectively be silent?

levkk•1h ago

It can be silent, but usually it's loud and confusing because people do something like this (Rails example):

    user = User.create(email: "test@test.com")
    SendWelcomeEmail.perform_later(user.id)

And the job code fetches the row like so:

    user = User.find(id)

This blows up because `find` throws an error if the record isn't there. Job queues typically use replicas for reads. This is a common gotcha: code that runs async expects the data to be there after creation.

There can be others, of course, especially in fintech where you have an atomic ledger, but people are usually pretty conscious about this and send those type of queries to the primary.

In general though, I completely agree, this is leaky and an unsolved problem. You can have performance or accuracy, but not both, and most solutions skew towards performance and make applications handle the lack of accuracy.

jackfischer•56m ago

Makes sense, appreciate it

I_am_tiberius•1h ago

I really hope to use the sharding feature one day.

codegeek•1h ago

Stupid question but does this shard the database as well or do we shard manually and then setup the configuration accordingly ?

levkk•1h ago

It shards it as well. We handle schema sync, moving table data (in parallel), setting up logical replication, and application traffic cutover. The zero-downtime resharding is currently WIP, working on the PR as we speak: https://github.com/pgdogdev/pgdog/pull/784.

codegeek•1h ago

Incredible. I am really interested in trying pgdog for our B2B SAAS product. Will do some testing.

saisrirampur•1h ago

Great progress, guys! It’s impressive to see all the enhancements - more types, more aggregate functions, cross-node DML, resharding, and reliability-focused connection pooling and more. Very cool! These were really hard problems and took multiple years to build at Citus. Kudos to the shipping velocity.

cuu508•44m ago

Some HTTP proxies can do retries -- if a connection to one backend fails, it is retried on a different backend. Can PgDog (or PgBouncer, or any other tool) do something similar -- if there's a "database server shutting down" error or a connection reset, retry it on another backend?

levkk•9m ago

Not currently, but we can add this. One thing we have to be careful of is to not retry requests that are executing inside transactions, but otherwise this would be a great feature.

Show HN: PgDog – Scale Postgres without changing the app

Show HN: Sowbot – open-hardware agricultural robot (ROS2, RTK GPS)

Show HN: Shibuya – A High-Performance WAF in Rust with eBPF and ML Engine

Show HN: AI Timeline – 171 LLMs from Transformer (2017) to GPT-5.3 (2026)

Show HN: CIA World Factbook Archive (1990–2025), searchable and exportable

Show HN: BVisor – An Embedded Bash Sandbox, 2ms Boot, Written in Zig

Show HN: AgentDbg - local-first debugger for AI agents (timeline, loops, etc.)

Show HN: Unlock the best engineering knowledge in papers for your coding agent

Show HN: What I've learned from shipping 25 mobile apps

Show HN: Free ecommerce platform for link-in-bio people

Show HN: A geometric analysis of Chopin's Prelude No. 4 using 3D topology

Show HN: Mato – a Multi-Agent Terminal Office workspace (tmux-like)

Show HN: 3D Mahjong, Built in CSS

Show HN: Agent Multiplexer – manage Claude Code via tmux

Show HN: TTSLab – A voice AI agent and TTS lab running in the browser via WebGPU

Show HN: EloPhanto – A self-evolving AI agent that builds its own tools

Show HN: Self-hosted lightweight file sharing app. (folderhost)

Show HN: I built an iOS app to WebRTC into my Mac terminal from the toilet

Show HN: Local-First Linux MicroVMs for macOS

Show HN: Slipshow, a multi-paradigm presentation tool

Show HN: SkillScan – Free API to detect malicious AI agent skill files

Show HN: Keep your eyes healthy with 20 20 20 rule reminder using bash

Show HN: Implementing ping from the Ethernet layer (ARP,IPv4,ICMP in user space)

Show HN: Rendering 18,000 videos in real-time with Python

Show HN: Agentic programming needs new processes

Show HN: Llama 3.1 70B on a single RTX 3090 via NVMe-to-GPU bypassing the CPU

Show HN: Visual Tailwind CSS Style Guide – Single HTML file, no build step

Show HN: TLA+ Workbench skill for coding agents (compat. with Vercel skills CLI)

Show HN: Monolith e-commerce platform for serverless

Show HN: Iron-Wolf – Wolfenstein 3D source port in Rust

Show HN: PgDog – Scale Postgres without changing the app

Comments

Show HN: PgDog – Scale Postgres without changing the app

Show HN: Sowbot – open-hardware agricultural robot (ROS2, RTK GPS)

Show HN: Shibuya – A High-Performance WAF in Rust with eBPF and ML Engine

Show HN: AI Timeline – 171 LLMs from Transformer (2017) to GPT-5.3 (2026)

Show HN: CIA World Factbook Archive (1990–2025), searchable and exportable

Show HN: BVisor – An Embedded Bash Sandbox, 2ms Boot, Written in Zig

Show HN: AgentDbg - local-first debugger for AI agents (timeline, loops, etc.)

Show HN: Unlock the best engineering knowledge in papers for your coding agent

Show HN: What I've learned from shipping 25 mobile apps

Show HN: Free ecommerce platform for link-in-bio people

Show HN: A geometric analysis of Chopin's Prelude No. 4 using 3D topology

Show HN: Mato – a Multi-Agent Terminal Office workspace (tmux-like)

Show HN: 3D Mahjong, Built in CSS

Show HN: Agent Multiplexer – manage Claude Code via tmux

Show HN: TTSLab – A voice AI agent and TTS lab running in the browser via WebGPU

Show HN: EloPhanto – A self-evolving AI agent that builds its own tools

Show HN: Self-hosted lightweight file sharing app. (folderhost)

Show HN: I built an iOS app to WebRTC into my Mac terminal from the toilet

Show HN: Local-First Linux MicroVMs for macOS

Show HN: Slipshow, a multi-paradigm presentation tool

Show HN: SkillScan – Free API to detect malicious AI agent skill files

Show HN: Keep your eyes healthy with 20 20 20 rule reminder using bash

Show HN: Implementing ping from the Ethernet layer (ARP,IPv4,ICMP in user space)

Show HN: Rendering 18,000 videos in real-time with Python

Show HN: Agentic programming needs new processes

Show HN: Llama 3.1 70B on a single RTX 3090 via NVMe-to-GPU bypassing the CPU

Show HN: Visual Tailwind CSS Style Guide – Single HTML file, no build step

Show HN: TLA+ Workbench skill for coding agents (compat. with Vercel skills CLI)

Show HN: Monolith e-commerce platform for serverless

Show HN: Iron-Wolf – Wolfenstein 3D source port in Rust