Well this is from Kyle Kingsbury, the Chuck Norris of transactional guarantees. AWS has to reply or clarify, even if only seems to apply to Multi-AZ Clusters. Those are one of the two possibilities for RDS with Postgres. Multi-AZ deployments can have one standby or two standby DB instances and this is for the two standby DB instances. [1]
They make no such promises in their documentation. Their 5494 pages manual on RDS hardly mentions isolation or serializable except in documentation of parameters for the different engines.
Nothing on global read consistency for Multi-AZ clusters because why should they.... :-) They talk about semi-synchronous replication so the writer waits for one standby to confirm log record, but the two readers can be on different snapshots?
[1] - "New Amazon RDS for MySQL & PostgreSQL Multi-AZ Deployment Option: Improved Write Performance & Faster Failover" - https://aws.amazon.com/blogs/aws/amazon-rds-multi-az-db-clus...
[2] - "Amazon RDS Multi-AZ with two readable standbys: Under the hood" - https://aws.amazon.com/blogs/database/amazon-rds-multi-az-wi...
Well, as a user, I wish they would mention it though. If I migrate to RDS with multi-AZ after coming from plain Postgres (which documents snapshot isolation as a feature), I would probably want to know how the two differ.
> Amazon RDS for PostgreSQL multi-AZ clusters violate Snapshot Isolation
The way I've threaded this needle, after several frustrating attempts, is to have a policy of titling all reports "Jepsen: <system> <version>". HN is of course welcome to choose their own link text if they prefer a more descriptive, or colorful, phrase. :-)
The fact that the thread is high on HN, plus the GP comment is high in the thread, plus that the audience knows how interesting Jepsen reports get, should be enough to convey the needful.
the Jepsen writeups will surely stand the test of time thank you!
You may not be part of that world now, but you can be some day.
EDIT: forgot to say, i had to read 6 or 7 books on Bayesian statistics before i understood the most basic concepts. A few years later i wrote a compiler for a statistical programming language.
I somewhat feel that there was a generation that had it easier, because they were pioneers in a new field, allowing them to become experts quickly, while improving year-on-year, being paid well in the process, and having great network and exposure.
Of course, it can be done, but we should at least acknowledge that sometimes the industry is unforgiving and simply doesn't have on-ramps except for the privileged few.
I don't think so. I've been doing this for nearly 35 years now, and there's always been a lot to learn. Each layer of abstraction developed makes it easier to quickly iterate towards a new outcome faster or with more confidence, but hides away complexity that you might eventually need to know. In a lot of ways it's easier these days, because there's so much information available at your fingertips when you need it, presented in a multitude of different formats. I learned my first programming language by reading a QBasic textbook trying to debug a text-based adventure game that crashed at a critical moment. I had no Internet, no BBS, nobody to help, except my Dad who was a solo RPG programmer who had learned on the job after being promoted from sweeping floors in a warehouse.
Essentially: The configuration claims "Snapshot Isolation", which means every transaction looks like it operates on a consistent snapshot of the entire database at its starting timestamp. All transactions starting after a transaction commits will see the changes made by the transaction. Jepsen finds that the snapshot a transaction sees doesn't always contain everything that was committed before its starting timestamp. Transactions A an B can both commit their changes, then transactions C and D can start with C only seeing the change made by A and D only seeing the change made by B.
Why? Because it has variables and a graph?
What sort of education background do you have?
I’ve repeatedly used ChatGPT and Claude to help me understand papers and to cut through the verbiage to the underlying concepts.
For this particular one, the graph under "Results" is the most approachable portion, I think. (Don't skip the top two sections, though … and they're so short.) In the graph, each line is a transaction, and read them left-to-right.
Hopefully I get this right, though if I do not, I'm sure someone will correct me. Our database is a set of ordered lists of integers. Something like,
CREATE TABLE test (
id int primary key,
-- (but specifically, this next column holds a list of ints, e.g.,
-- a value might be, '1,8,11'; the list of ints is a comma separated
-- string.)
list text not null
);
The first transaction: a 89 9
This is shorthand; means "(a)ppend to list #89 the integer 9" (in SQL, crudely this is perhaps something like UPDATE test SET list = CONCAT(list, ',9') WHERE id = 89;
… though we'd need to handle the case where the list doesn't exist yet, turning it into an `INSERT … ON CONFLICT … DO UPDATE …`, so it would get gnarlier.[2]); the next: r 90 nil # read list 90; the result is nil
r 89 [4 9] # read list 89; the result is [4, 9]
r 90 nil # read line 90; the result is (still) nil
I assume you can `SELECT` ;) That should provide sufficient syntax for one to understand the remainder.The arrows indicate the dependencies; if you click "read-write dependencies"[1], that page explains it.
Our first transaction appends 9 to list 89. Our second transaction reads that same list, and sees that same 9, thus, it must start after the first transaction has committed. The remaining arrows form similar dependencies, and once you take them all into account, they form a cycle; this should feel problematic. It's that they're in a cycle, which snapshot isolation does not permit, so we've observed a contradiction in the system: these cannot be obeying snapshot isolation. (This is what "To understand why this cycle is illegal…" gets at; it is fairly straightforward. T₁ is the first row in the graph, T₂ the second, so forth. But it is only straight-forward once you've understood the graph, I think.)
> This is in such a thick academic style that it is difficult to follow what the problem actually might be and how it would impact someone.
I think a lot of this is because it is written with precision, and that precision requires a lot of academic terminology.
Some of it is just syntax peculiar to Jepsen, which I think comes from Clojure, which I think most of us (myself included) are just not familiar with. Hence why I used SQL and comma-sep'd lists in my commentary above; that is likely more widely read. It's a bit rough when you first approach it, but once you get the notation, the payoff is worth it, I guess.
More generally, I think once you grasp the graph syntax & simple operations used here, it becomes easier to read other posts, since they're mostly graphs of transactions that, taken together, make no logical sense at all. Yet they happened!
> This style of writing serves mostly to remind me that I am not a part of the world that writes like this, which makes me a little sad.
I think Jepsen posts, with a little effort, are approachable. This post is a good starter post; normally I'd say Jepsen posts tend to inject faults into the system, as we're testing if the guarantees of the system hold up under stress. This one has no fault injection, though, so it's a bit simpler.
Beware though, that if you learn to read these, that you'll never trust a database again.
[1]: https://jepsen.io/consistency/dependencies
[2]: I think this is it? https://github.com/jepsen-io/postgres/blob/225203dd64ad5e5e4... — but this is pushing the limits of my own understanding.
I chuckled, but (while I don't have links to offer) I could have sworn that there were some of them which actually passed, and a handful of others that took the report to heart and fixed the bugs. I am similarly recalling that a product showed up to their Show HN or Launch HN with a Jepsen in hand, which I was especially in awe of the maturity of that (assuming, of course, I'm not hallucinating such a thing)
Am I correct in understanding either AWS is doing something with the cluster configuration or has added some patches that introduce this behavior?
Specially here: https://youtu.be/fLqJXTOhUg4?t=434
I also understand there are lots of ways to do Postgres replication in general, with varying results. For instance, here's Bin Wang's report on Patroni: https://www.binwang.me/2024-12-02-PostgreSQL-High-Availabili...
Due to the way PostgreSQL does snapshotting, I don't believe this implies such a read might obtain a nonsense value due to only a portion of the bytes in a multi-byte column type having been updated yet.
It seems like a race condition that becomes eventually consistent. Or did anyone read this as if the later transaction(s) of a "long fork" might never complete under normal circumstances?
i was intuitively wondering the same but i'm having trouble reasoning how the post's example with transactions 1, 2, 3, 4 exhibits this behavior. in the example, is transaction 2 the only read-only transaction and therefore the only transaction to read from the read replica? i.e. transactions 1, 3, 4 use the primary and transaction 2 uses the read replica?
I was worried I had made the wrong move upgrading major versions, but it looks like this is not that. This is not a regression, but just a feature request or longstanding bug.
I understand the mission of the Jepsen project but presenting results in this format is misleading and will only sow confusion.
Transaction isolation involves a ton of tradeoffs, and the tradeoffs chosen here may be fine for most use cases. The issues can be easily avoided by doing any critical transactional work against the primary read-write node only, which would be the only typical way in which transactional work would be done against a Postgres cluster of this sort.
Lucky them, there is an automated suite[1] to verify the correct behavior :-D
Using the words Bug and guarantee is throwing the casual readers off the mark ?
Unfortunately reliability is not that high on the priority list here. Keep up the good work!
Multi-AZ instances is a long-standing feature of RDS where the primary DB is synchronously replicated to a secondary DB in another AZ. On failure of the primary, RDS fails over to the secondary.
Multi-AZ clusters has two secondaries, and transactions are synchronously replicated to at least one of them. This is more robust than multi-AZ instances if a secondary fails or is degraded. It also allows read-only access to the secondaries.
Multi-AZ clusters no doubt have more "magic" under the hood, as its not a vanilla Postgres feature as far as I'm aware. I imagine this is why it's failing the Jepsen test.
So if snapshot violation is happening inside Multi-AZ instances, it can happen with a single region - multiple read replica kind of setup as well ? But it might be easily observable in Multi-AZ setups because the lag is high ?
henning•6h ago
kabes•6h ago
__alexs•5h ago
semiquaver•5h ago
I’m not well versed in RDS but I believe that clustered is the only way to use it.
NewJazz•5h ago
reissbaker•5h ago
dragonwriter•5h ago
dragonwriter•5h ago
colesantiago•5h ago
I was quite surprised to read that Stripe uses MongoDB in the early days and still today and I can't imagine the sheer nightmares they must have faced using it for all these years.
colechristensen•5h ago
djfivyvusn•5h ago
Literally the only job ad I've seen talking about MongoDB was a job ad for MongoDB itself.
senderista•5h ago
computerfan494•5h ago
bananapub•5h ago
robterrell•5h ago
MarkMarine•5h ago
https://web.archive.org/web/20150312112552/http://blog.found...
bananapub•5h ago
necubi•5h ago
Now members of the original Foundation team have started Antithesis (https://antithesis.com/) to make it easier for other systems to adopt this sort of testing.
Thaxll•5h ago
djfivyvusn•5h ago
Thaxll•2h ago
xmodem•4h ago