You _could_ make that ACID, but it's not going to be faster than a single machine.
Also had an issue with it because it cached authentication requests when doing passthrough it seems, I'd changed the roles password, but it kept using the old one, which was no bueno ;).
PgDog seems to make more sense when you really care about a few databases that need massive scale, rather than a simple proxy in front of postgres. I'll keep following the development though, it is much needed in this space, postgres can use all the investment it can get to get it past the single machine scale that it excels at currently.
You could also build a watcher side car that watches for changes of the pgdog_users.toml and have pgdog refresh itself then too with this combination. We thought about that but prefer to control the reloads for our needs.
The same old processes vs. threads debate, plus having the ability to scale the coordinator past a single machine. So, if you're OLTP, definitely consider PgDog. OLAP - Citus still wins because of its advanced query engine. We'll get there.
Wrt. the pooler, how do you compare with pgbouncer?
I'm interested because I have a postgres instance, low-traffic but still like ... tens of r(eads)ps. I was not running anything close to the machine limits but still added pgbouncer to improve performance and didn't see a noticeable difference. I was stress-testing the machine obv., I'm not talking about the 10 rps, lol.
For context, my numbers were something like 10k rps +/- 1k vanilla postgres and like 9k rps +/- 1k with pgbouncer in front of it. So ... slightly slower but big error bars so I wouldn't say for sure. I ended up not using pgbouncer as the benefit was immaterial.
Also yeah, in case you want to check it out, it's the db that backs this project: https://httpstate.com.
This solves the thousands of clients case for read in a way that is transparent to the clients.
Yes it's required at large scale, especially if you want to distribute reads or shard to a particular geographical area.
Is there a binary I can run directly?
Then again, sharding on a single host probably isn't very useful anyway - but it might work with docker in swarm mode?
Expanding on that a bit, mongo drivers even have a shared specification of the state machine for monitoring topology changes[1] and algorithm for selecting the server to send an operation to[2] (along with various declarative test cases that the drivers use to validate them alongside the specs in the repo). I think people sometimes underestimate how important the client-side work is to this sort of experience; for all of the faults mongo has had over the years, the amount of investment that they put into the client libraries is something I've never seen anywhere else (although having spent several years working on some of these libraries, my take is likely very biased).
[1]: https://github.com/mongodb/specifications/blob/master/source... [2]: https://github.com/mongodb/specifications/blob/master/source...
Right now I have a project that has very heavy write traffic from multiple services and a web app that reads from this. We are starting to hit the point where no amount of indexing, query optimisation, caching or box upgrades is helping us. We are looking at maybe moving the bulk of the static data to clickhouse to reduce the DB size but I would love to hear if PgDog or other kind of sharding could be useful for this use case.
For both MySQL and PostgreSQL you will need to use some kind of logical upgrades if you want no downtime.
We sharded over 20 TB that we know about.
This is probably a typo, right? 20TB isn't that big. I would imagine they've sharded a lot more than thatIf you’re already sharding by tenant for other reasons, OK… But I see CDC to a true OLAP system as more scalable.
PostgreSQL still needs real columnar tables in the core, hopefully one day
This is for DBs that are ~1-1.5TB but doesnt have a huge amount of churn/qps
Effectively what is described here https://www.pgedge.com/blog/always-online-or-bust-zero-downt...
If you use something like CloudNativePG they automate parts of the process with cli tools and declarative syntax. Otherwise you take the time to figure it out by hand. You do an exercise on your staging DB, and if all goes well you do the same procedure in prod.
htrp•1h ago
Still trying to figure out how this works technically, is the performance gain really just re-write in rust?
levkk•1h ago
VeninVidiaVicii•58m ago
levkk•56m ago