Database-per-tenant is great for read-heavy apps - most tenants are small and their tables don't have a lot of records, so even very complex joins are very fast.
The main problem is that release times can increase substantially, because you have to migrate lots of individual databases one by one, and sometimes schema/data drift can occur between DBs, and then your release stops midway and you have to figure out why in some tenants the feature works, and in some it breaks... So it's not free launch, the complexity just shifts to a different place.
At least with Single Point of Success database design you either move or don’t.
Most systems have a minimum amount of traffic they'll let you pay for, most object stores are not setup to stream small updates in a transactionally consistent way, there's a ton of complexity "in the small" as it were.
Some of other cloud databases don't charge you for database but simply for usage so in that case, usage = customer revenue so cost should scale as you do.
On the other hand, I could imagine doing lazy migrations as user or tenants sign in as a nice way to smooth out the increased effort in migrating if you can engineer for it.
I guess per-tenant you would handle similarly but build some tooling to monitor the state of migrations. It might even be easier in some ways, as you might be more able to take table locks when migrating per-tenant.
For more important customers, you could always preemptively run their database schemas updates for them so they didn't see update screen.
I really like the free lunch / free launch pun here, intentional or not.
Do you store the DB too, or rebuild it from your records if the client loses their DB?
Why not use Postgres with row level security instead?
Far better than database per tenant.
The better isn’t clear here, why is it better?
Database per tenant - barring configuration errors to a connection string or something along those lines - means you won’t ever accidentally leak over other customers data
With Row level security, as good as it is, there is always a chance of leaking happening, due to bugs, user error or the like
Database per tenant also makes migrations safer, and allows easier versioning of an application, for example if you run a SaaS you may put your V2 application into permanent maintenance mode and allow existing customers to use it in perpetuity, while upgrading customers and new customers are put on the V3 platform. This is infinitely easier in practice when the database is per tenant
That’s not correct. It’s hard isolation. In effect, RLS automatically applies a where clause preventing developers inadvertently accessing the wrong data.
If you don’t set the Postgres environment variable then no data is returned.
All the hard isolation without the schema sync pain.
Many of our clients eventually want to host our app on-premises, so moving it to an on-premises environment is quite easy with the database-per-tenant approach. Just copy the database as is.
What I like about having everything in one db, until it grows too big, is that I can do cross-users analytics/reporting. I also had it happen, like it was mentioned in the article, that I needed to share some data between users. Having everything in a single database made this much simpler problem than needing to move data between databases.
You'll more easily pool connections to the same DB, of course, but the difference might not be so stark.
There could be merit to "open and close right after" though, for sure.
Chances are, the popular databases stay in cache and are quick to open anyway; and the unpopular ones are rarely accessed so delay is ok. But you'd also be able to monitor for disk activity/latency on a system level and add more disks if you need more throughput; possibly disks attached to other machines, if you also need more cpu/ram to go with it. Should be relatively simple to partition the low use databases, because they're low use.
It'd probably work better if tenant == user and more specifically tenant == single-user-session.
Postgres with row based access control is a much better solution to database per tenant/strong isolation.
https://www.crunchydata.com/blog/row-level-security-for-tena...
This problem also does happen with sharded databases - which is why most co's have a structured approach of deploying schema changes, observing them having applied everywhere, and only then deploying the feature relying on the schema changes.
Postgres row level access control gives all the benefits of strong isolation with none of the pain of schemas getting out of sync.
1. Identify top-N tenants
2. Separate the DB for these tenants
The top-N could be based on mix of IOPS, importance (revenue wise), etc.
The data model should be designed in such a way that from rows pertaining to each tenant can be extracted.
I think the lack of gems/libraries/patterns is proof of this. Just because you can doesn't mean you should.
Not saying there's no reason to ever do it, proceed with caution and know for a fact you need db per tenant.
> I think lack of gems/libraries/patterns is proof of this
This would effectively disqualify any new pattern from emerging. There have been many new patterns that have challenged the consensus that ended up becoming dominant.
Mostly as a fun experiment, but also from the realization that every place I worked at in the past (small/medium-sized B2B startups) would greatly benefit from such architecture.
Yes, there are massive trade-offs to this approach, and the concerns raised in the comment section are valid. This doesn't mean the database-per-tenant is never worth it. There's a sweet spot for it, and if it fits your business/application, I personally would consider it a technical advantage over competitors.
My goal with FeebDB is to eliminate or reduce the common pain points of database-per-tenant, including:
- ensure there is a single writer per database.
- improved connection management across all tenants (e.g. only keep open at most 1000 DB connections, similar to an LRU cache).
- on-demand migration (all shards are migrated on application startup, but if a shard that hasn't migrated yet receives a request, it will first perform the migration and then serve the request),
- on-demand backups and replication (e.g. the library knows which shards were updated in the last X minutes, so it can trigger Litestream or similar on demand).
- support for enumeration of databases (performing map/reduce/filter operations across multiple DBs)
- support for clustered deployment with "pinned" tenants (for now I'm assuming the IOPS of a single beefy server should be enough for all use cases, but once that's no longer sufficient you can have "shards of shards")
Because essentially what’s happening with database per user is that we don’t keep anything in memory for users who are inactive, or only active on another instance, for essentially the same schema (or maybe a couple versions of a schema).
This is starting to smell like the JSON situation with Mongo. Where Postgres is now twice as fast as Mongo at the thing Mongo is supposed to be an expert in.
The benefits of prefetching and page-level abstractions are certainly reduced when every tenant is guaranteed to live in a different file.
I would agree that you should have to deal with an uphill battle when arguing for a DB-per-customer architecture. There needs to be a special reason to not simply add a customer/tenant id column in each table.
I think one good argument for placing a tenant per database file is that it makes total destruction of their information possible without any impact to other tenants. If you don't require immediate destruction of the tenant's information (i.e., can wait for background cleanup activities), then I would begin to push back.
That is all.
shauntest12321•8h ago