If you’ve ever administered Postgres at scale, multiply it by 10. That’s what dealing with multi-master is like. It’s a nightmare.
Similarly maddening, the appalling lack of normalization that is simply taken for granted. “It’s faster, bro.” No, no, it is not. Especially not at the hundreds of millions or billions of rows scale. If you store something low-cardinality like a status column, with an average length of perhaps 7 characters, that’s 8 bytes (1 byte overhead assumed, but it could be 2). Multiply that by 2 billion rows, and you’re wasting 16 GB. Disk is cheap, but a. Memory isn’t b. Don’t be lazy. There’s a right way to use an RDBMS, and a wrong way. If you want a KV store, use a damn KV store.
Finally, I’d be remiss if I failed to point out that Prisma is an unbelievably immature organization who launched without the ability to do JOINS [0]. They are forever dead to me for that. This isn’t “move fast and break things,” it’s “move fast despite having zero clue what we’re doing but convince JS devs that we do.”
with a little pain, if I had to pick an extreme, I'd pick extreme normalization with materialized views that are queried (e.g. no joins), rather than joining all of the time.
This is a safety feature. If my code expects columns A, B, and C, but the migration to add C hasn't run yet and I'm doing something that would otherwise `SELECT `, my query should fail. If the ORM _actually_ does `SELECT ` I'll get back two columns instead of three and things can get spooky and bad real fast (unless the ORM manually validates the shape of the query response every time, which will come with a real runtime cost). If there are columns that the ORM doesn't know about, you could end up with _far more_ data being returned from the database, which could just as easily cause plenty of spooky issues—not the least of which being "overwhelming your network by flooding the client connections with data the application doesn't even know exists".
This isn't said enough.
I added about 4 indexes and halved the resources overnight. But Prisma, SELECT *, graphql and what other resume building shit people implemented was the bane of my existence, typically engineers did this believing it would be faster. I remember 1 engineer had a standing ovation in slack for his refactor which was supposedly going to save us $$$$$ except our DB CPU went up 30% because he decided to validate every company every second in every session. In his defense, he added 1 line of code that caused it, and it was obscured through prisma and graphql to an inefficient query.
FWIW; I love CockroachDB but the price is directly linked to how much your software engineers shit on the database.
But I'm no fan of Prisma either. Drizzle has its own pain points (i.e. sequential numbers for its auto-generated migrations means annoying merge conflicts if multiple people iterate on the schema at the same time), but it's much better than Prisma at sticking close to the metal and allowing good query performance and table design.
I think the indexes they suspect were used are unused but Motion didn't realize CockroachDB was doing zigzag joins on other indexes to accomplish the same thing, leaving the indexes that would be obviously used as genuinely not used.
It's a great feature but CRDB's optimizer would prefer a zig zag join over a covering index, getting around this required indexes be written in a way to persuade the optimizer to not plan for a zig zag join.
I already knew about prisma from the infamous https://github.com/prisma/prisma/discussions/19748
..........I'm sorry, what? That seems........absurd.
edit: Might as well throw in: I can't stand ORMs, I don't get why people use it, please just write the SQL.
- Using `JOIN`s (with correlated subqueries and JSON) has been around for a while now via a `relationLoadStrategy` setting.
- Prisma has a Rust service that does query execution & result aggregation, but this is automatically managed behind the scenes. All you do is run `npx prisma generate` and then run your application.
- They are in the process of removing the Rust layer.
The JOIN setting and the removing of the middleware service are going to be defaults soon, they're just in preview.
The removal of the rust service is available in preview for Postgres as of 6.7.[1]
Rewriting significant parts of a complex codebase used by millions is hard, and pushing it to defaults requires prolonged testing periods when the worst case is "major data corruption".
[1]: https://www.prisma.io/blog/try-the-new-rust-free-version-of-...
Harder than just doing joins.
Funny relevant story: we got an OOM from a query that we used Prisma for. I looked into it - it’s was a simple select distinct. Turns out (I believe it was changed like a year ago, but I’m not positive), event distincts were done in memory! I can’t fathom the decision making there…
This is one of those situations where I can't tell if they're operating on some kind of deep insight that is way above my experience and I just don't understand it, or if they just made really bad decisions. I just don't get it, it feels so wrong.
This is answered at the very top of the link on the post you replied to. In no unclear language, no less. Direct link here: https://github.com/prisma/prisma/discussions/19748#discussio...
> I want to elaborate a bit on the tradeoffs of this decision. The reason Prisma uses this strategy is because in a lot of real-world applications with large datasets, DB-level JOINs can become quite expensive...
> The total cost of executing a complex join is often higher than executing multiple simpler queries. This is why the Prisma query engine currently defaults to multiple simple queries in order to optimise overall throughput of the system.
> But Prisma is designed towards generalized best practices, and in the "real world" with huge tables and hundreds of fields, single queries are not the best approach...
> All that being said, there are of course scenarios where JOINs are a lot more performance than sending individual queries. We know this and that's why we are currently working on enabling JOINs in Prisma Client queries as well You can follow the development on the roadmap.
Though this isn't a complete answer still. Part of it is that Prisma was, at its start, a GraphQL-centric ORM. This comes with its own performance pitfalls, and decomposing joins into separate subqueries with aggregation helped avoid them.
I used to agree until I started using a good ORM. Entity Framework on .NET is amazing.
I disagree. It is probably one of the less terrible ORMs, but it is far from amazing. The object-relational impedance mismatch will always dominate for anything that isn't trivial business. EF works great until you need different views of the model. It does support some kind of view mapping technique, but it's so much boilerplate I fail to see the point.
Dapper + SqlConnection is goldilocks once you get into the nasty edges. Being able to query a result set that always exactly matches your view models is pretty amazing. The idea of the program automagically upgrading and migrating the schemas is something that was interesting to me until I saw what you could accomplish with Visual Studio's SQL Compare tool & RedGate's equivalent. I feel a lot more comfortable running manual schema migrations when working with hosted SQL providers.
You can easily project or use views with SQL then projected onto objects. It's very convenient with `.FromSql`:
https://learn.microsoft.com/en-us/ef/core/querying/sql-queri...
Can you call .Select(entity => SomeSmallerModel() { Name = entity.Name }) or something like that to select what you need? If I am understanding your issue correctly.
I also agree that its one of the least worst but there are still things that annoy me.
I removed a $lookup (the mongodb JOIN equivalent) and replaced it with, as Prisma does, two table lookups and an in-memory join
p90 response times dropped from 35 seconds to 1.2 seconds
As someone who has primarily worked with Postgres for relational concerns, I’ve envied the apparent robustness of the MySQL scaling solutions.
There's a handful of companies that have scaled Citus past 1PB for production usage, but the examples I'm aware of all had more engineering to avoid capability or architecture limitations than one might like. I'd love to see someone come back with a fresh approach that covered more use cases effectively.
Disclaimer: former Citus employee
I've sworn off ORMs entirely. My application is a Postgres application first and foremost. I use PG-specific features extensively. Why would I sacrifice all the power that Postgres offers me just for some conveniences in Python, or Ruby, or whatever?
Nah. Just write the good code for your database.
When I need something PG specific I have options like writing raw SQL queries.
Having most of my data layer in C# is fantastic for productivity and in most cases the performance compared to SQL is negligible.
We will absolutely share our findings when that migration happens!
Also, query planner maturity is a big deal. It's hard to get Spanner to use the indexes you want.
It's interesting that people typically think of columnstores for strict BI / analytics. But there are so many App / user-facing workloads that actually need it.
ps: we're working on pg_mooncake v0.2. create a columnstore in Postgres that's always consistent with your OLTP tables.
It might help for this workload.
If you want to fully embrace the vibe tables are difficult.
Even before LLMs, I was at a certain company that preferred MongoDB so we didn’t need migrations.
Sometimes you don’t care about data structure and you just want to toss something up there and worry about it later.
Postgres is the best answer if you have a solid team and you know what you’re doing.
If you want to ride solo and get something done fast, Firebase and its NoSQL cousins might be easier .
I don't think I've once seen a migrating away from Postgres article.
https://www.youtube.com/watch?v=DNHMYp8M40k
I'm trying to avoid editorializing in my above summary, for fear of mischaracterizing their opinions or the current state of postgres. Their use of postgres was 10 years ago, they were using postgres for a high-availability use case -- so they (and I) don't think "postgres bad, cockroach good." But like Bryan Cantrill says, "No one cares about your workload like you do." So benchmark! Don't make technical decisions via "vibes!"
Aside. Job section says not 9-5. What does that mean? Long hours? Or not 9-5 attitude?
When I first stepped into a DBA role with CockroachDB I was confused why indexes we obviously need were in unused indexes. It wasn't until I did an explain on the queries I learned the planner was doing zig-zag joins instead.
I did a double take at this. At the onset of the article, the fact they're using a distributed database and the mention of a "mid 6 figure" DB bill made me assume they have some obscenely large database that's far beyond what a single node could do. They don't detail the Postgres setup that replaced it, so I assume it's a pretty standard single primary and a 100 million row table is well within the abilities of that—I have a 150 million row table happily plugging along on a 2vCPU+16GB instance. Apples and oranges, perhaps, but people shouldn't underestimate what a single modern server can do.
That sounds insane for a crud app with one million users.
What am I missing?
The largest table was over 100 million rows. Some migrations were painful, however. At that time, some of them would lock the whole table and we'd need to run them overnight. Fortunately, this was for an internal app so we could do that.
Secondly, we did most of these things using SQL, Bash scripts, cron jobs and some I/O logic built directly into the application code. They were robust enough to handle some extremely mission critical systems (a failure could bring down a US primary market and if it's bad enough, you hear it on the news).
hobs•3h ago
NegativeLatency•2h ago
Currently the vendor lock-in or requirements for installing plugins make it hard to do with cloud sql providers. Especially hard since by the time it's a problem you're probably at enough scale to make switching db/vendors hard or impossible.
hobs•2h ago
bastawhiz•2h ago
https://www.postgresql.org/docs/current/indexes-expressional...
panzi•1h ago
renhanxue•1h ago
I can't think of any case at all, no matter how contrived, where you'd want to use the non-B versions of the JSON aggregate functions though.
paulryanrogers•37m ago