More databases should be single-threaded

https://blog.konsti.xyz/p/8c8a399f-8cfe-47dd-9278-9527105d07dc/

52•lawrencechen•1mo ago

Comments

qouteall•1mo ago

In real-world business requirements it often need to read some data then touch other data based on previous read result.

It violates the "every transaction can only be in one shard" constraint.

For a specific business requirement it's possible to design clever sharding to make transaction fit into one shard. However new business requirements can emerge and invalidate it.

"Every transaction can only be in one shard" only works for simple business logics.

rowanG077•1mo ago

in my experience most backends I have worked on people don't use the facilities of their database. They indeed simply hit the database two or more times. But that doesn't mean it's not possible to do better if you actually put more care in your queries. Most of the time multiple transactions can be eliminated. So I don't agree this is a business requirement complexity problem. It's a "it works so it's good enough" problem, or a "lazy developer" problem depending on how you want to frame it.

formerly_proven•1mo ago

This (along with n+1) is somewhat encouraged in business applications due to the prevalence of the repository pattern.

SoftTalker•1mo ago

Give each business or customer its own schema and you almost never need sharding.

n2d4•1mo ago

Yes, but you could also flip it the other way around — make the business or customer your sharding key, and you'll only need to manage one schema!

n2d4•1mo ago

I talk about these problems in the "How hard can sharding be?" section of the article — long story short, not all business requirements can be dealt with easily, but surprisingly many can if you choose a smart sharding key.

You can also still do optimistic concurrency across shards! That covers most of the remaining ones. Anything that requires anything more complex — sagas, 2PC, etc. — is relatively rare, and at scale, a traditional SQL OLTP will also struggle with those.

qouteall•1mo ago

Thanks for reply.

So in my understanding:

- The transactions that only touch one shard is simple

- The transactions that read multiple shards but only write shard can use simple optimistic concurrency control

- The transactions that writes (and reads) multiple shards stay complex. Can be avoided by designing a smart sharding key. (hard to do if business requirement is complex)

n2d4•1mo ago

That's right!

If you anticipate you will encounter the third type a lot, and you don't anticipate that you will need to shard either way, what I'm talking about here makes no sense for you.

qouteall•1mo ago

The optimistic concurrency control that reads multiple shards cannot use simple CAS. It probably needs to do something like two-phase committing

hinkley•1mo ago

Business people have a nasty habit of identifying two independent pieces of data you have and finding ideas to combine them to do something new. They aren’t happy until every piece of data is copied with every other piece and then they still aren’t happy because now everything is horrible because everything is coupled to everything.

kgeist•1mo ago

From what I understand, the complexity stays there, it's just moved from the DB layer to the app layer (now I have to decide how to shard data, how to reshard, how to synchronize data across shards, how to run queries across shards without wildly inconsistent results), so as I developer I have more headaches now than before, when most of that was taken care of by the DB. I don't see why it's an improvement.

The author also mentions B2B and I'm not sure how it's going to work. I understand B2C where you can just say "1 user=1 single-threaded shard" because most user data is isolated/independent from other users. But with B2B, we have accounts ranging from 100 users per organization to 200k users per organization. Something tells me making a 200k account single-threaded isn't a good idea. On the other hand, artificially sharding inside an organization will lead to much more complex queries overall too, because usually a lot of business rules require joining different users' data within 1 org.

n2d4•1mo ago

It's a different kind of complexity. Essentially, your app layer needs shift from:

    - transaction serializability
    - atomicity
    - deadlocks (generally locks)
    - occ (unless you do VERY long tx, like a user checkout flow)
    - retries
    - scale, infrastructure, parameter tuning

towards thinking about

    - separating data into shards
    - sharding keys
    - cross-shard transactions

which can be sometimes easier, sometimes harder. I think there are a surprising amount of problems where it's much easier to think about sharding than about race conditions!

> But with B2B, we have accounts ranging from 100 users per organization to 200k users per organization.

You'd be surprised at how much traffic a single core (or machine) can handle — 200k users is absolutely within reach. At some point you'll need even more granular sharding (eg. per user within organization), but at that point, you would need sharding anyways (no matter your DB).

bawolff•1mo ago

If you have to think about cross-shard transactions then you have to think about all the things on your first list too, as they are complexities related to transaction. I fail to see how that could possibly be simpler.

n2d4•1mo ago

Cross-shard transactions are only a tiny fraction of transactions — if the complexities of dealing with that is constrained to some transactions instead of all of them, you're saving yourself a lot of headaches.

Actually, I'd argue a lot of apps can do entirely without cross-shard transactions! (eg. sharding by B2B orgs)

whizzter•1mo ago

Yeah, mgmt (and more than anything, query tools) is gonna be a PITA.

But looking at it in a different way, say building something like Google Sheets.

One could place user-mgmt in one single-threaded database (Even at 200k users you probably don't have too many concurrently modifying administrators) whilst "documents" gets their own database. I'm prototyping one such "document" centric tool and the per-document DB thinking has come up, debugging users problems could be as "simple" as cloning a SQLite file.

Now on the other hand if it's some ERP/CRM/etc system with tons of linked data that naturally won't fly.

Tool for the job.

ltbarcly3•1mo ago

Wow that's a dumb take. The whole point of ACID is that you can get roughly the same result but have a system that can serve more than 1 user at a time.

codeslinger•1mo ago

Disclaimer: ex-AWS here.

This article ends up making a compelling case for DynamoDB. It has the properties he describes wanting. Many, many systems inside of Amazon are built with DDB as the primary datastore. I don't know of any OSS commensurate to DDB, but it would be quite interesting for one to appear.

> "Every transaction can only be in one shard" only works for simple business logics.

You'd be quite surprised at what you can get out of this model. Check out the later chapters of the DynamoDB Book [1] for some neat examples.

[1] https://dynamodbbook.com/

bboreham•1mo ago

The famous OSS database patterned after DynamoDB is https://cassandra.apache.org/

(Wondering if you never heard of it or if you don’t consider it commensurate).

raggi•1mo ago

We are doing this, and it’s terrible. Having done both at scale this one is worse.

bawolff•1mo ago

That's great if your shards are truly independent of each other, but if not then you just invented a custom transaction layer living in your application code, which sounds way way worse than the original problem.

And quite frankly, i think it is incredibly rare for the shards to both be fine grained and independent in typical oltp DB usecase.

lucyjojo•1mo ago

this is so very very right.

i don't know how it works now but datomic used to have a single-threaded writer (transactor?, if i remember right, been a long time) and offers serialization by default.

https://docs.datomic.com/datomic-overview.html

fmjrey•1mo ago

Sharding of data and compute is precisely what makes Rama [0] able to handle Internet scale topologies to create materialized views (PStates). Only one topology can write to a PState, and each PState has its own partitioning.

And yes, a developer needs to handle the added complexity of querying across partitions, but the language makes that easy.

Effectively Rama has fully deconstructed the database, not just its log, tables, and indexes, but also its query engine. It then gives the developer all the needed primitives and composition logic to handle any use case and schema.

Putting data into database silos and handling the compute separately is the schizophrenia that made everything more complicated: monolith were split into microservices, and databases into logs and nosql stores, each running in separate clusters. The way forward is to have one cluster for both data and compute, and make partitioning a first class construct of the architecture.

[0] https://redplanetlabs.com/

somat•1mo ago

This is needlessly pedantic but postgres is not multi threaded, Or at least historically it was not I don't really know if threading is currently being added or if the cloudbased "postgres-buts" have threading. but postgres standard is a pretty good example of the traditional forking unix server, And as someone deeply suspicious of threads in general, just because we could define a shared memory execution modal didn't mean that we should have done it, I think shared memory introduces too many footguns. Sticking with a multi-process model is probably the correct choice for postgress.

Needlessly pedantic because the distinction between multi process with explicit shared memory(shm) vs multi threaded with implicit shared memory probably does not really matter all that much.

SectorC: A C Compiler in 512 bytes

Brookhaven Lab's RHIC concludes 25-year run with final collisions

The F Word

Software factories and the agentic moment

I write games in C (yes, C)

Speed up responses with fast mode

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

First Proof

The Waymo World Model

Al Lowe on model trains, funny deaths and working with Disney

Show HN: A luma dependent chroma compression algorithm (image compression)

Vocal Guide – belt sing without killing yourself

Start all of your commands with a comma (2009)

Reinforcement Learning from Human Feedback

Selection Rather Than Prediction

We mourn our craft

Coding agents have replaced every framework I used

The AI boom is causing shortages everywhere else

France's homegrown open source online office suite

72M Points of Interest

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

A Fresh Look at IBM 3270 Information Display System

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

History and Timeline of the Proco Rat Pedal (2021)

Learning from context is harder than we thought

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

SectorC: A C Compiler in 512 bytes

Brookhaven Lab's RHIC concludes 25-year run with final collisions

The F Word

Software factories and the agentic moment

I write games in C (yes, C)

Speed up responses with fast mode

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

First Proof

The Waymo World Model

Al Lowe on model trains, funny deaths and working with Disney

Show HN: A luma dependent chroma compression algorithm (image compression)

Vocal Guide – belt sing without killing yourself

Start all of your commands with a comma (2009)

Reinforcement Learning from Human Feedback

Selection Rather Than Prediction

We mourn our craft

Coding agents have replaced every framework I used

The AI boom is causing shortages everywhere else

France's homegrown open source online office suite

72M Points of Interest

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

A Fresh Look at IBM 3270 Information Display System

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

History and Timeline of the Proco Rat Pedal (2021)

Learning from context is harder than we thought

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

More databases should be single-threaded

Comments