frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Pro-democracy HK tycoon Jimmy Lai convicted in national security trial

https://www.bbc.com/news/articles/cp844kjj37vo
87•onemoresoop•33m ago•21 comments

Carrier Landing in Top Gun for the NES

https://relaxing.run/blag/posts/top-gun-landing/
194•todsacerdoti•2h ago•70 comments

$50 PlanetScale Metal Is GA for Postgres

https://planetscale.com/blog/50-dollar-planetscale-metal-is-ga-for-postgres
63•ksec•58m ago•24 comments

P-computers can solve spin-glass problems faster than quantum systems

https://news.ucsb.edu/2025/022239/new-ucsb-research-shows-p-computers-can-solve-spin-glass-proble...
24•magoghm•1w ago•1 comments

Thousands of U.S. farmers have Parkinson's. They blame a deadly pesticide

https://www.mlive.com/news/2025/12/thousands-of-us-farmers-have-parkinsons-they-blame-a-deadly-pe...
198•bikenaga•2h ago•136 comments

Avoid UUIDv4 Primary Keys

https://andyatkinson.com/avoid-uuid-version-4-primary-keys
201•pil0u•7h ago•208 comments

It seems that OpenAI is scraping [certificate transparency] logs

https://benjojo.co.uk/u/benjojo/h/Gxy2qrCkn1Y327Y6D3
83•pavel_lishin•3h ago•56 comments

Speech and Language Processing (3rd ed. draft)

https://web.stanford.edu/~jurafsky/slp3/
30•atomicnature•1w ago•6 comments

Adafruit: Arduino’s Rules Are ‘Incompatible With Open Source’

https://thenewstack.io/adafruit-arduinos-rules-are-incompatible-with-open-source/
357•MilnerRoute•22h ago•194 comments

DNA Learning Center: Mechanism of Replication 3D Animation

https://dnalc.cshl.edu/resources/3d/04-mechanism-of-replication-advanced.html
60•timschmidt•1w ago•15 comments

Roomba maker goes bankrupt, Chinese owner emerges

https://news.bloomberglaw.com/bankruptcy-law/robot-vacuum-roomba-maker-files-for-bankruptcy-after...
483•nreece•16h ago•565 comments

Unscii

http://viznut.fi/unscii/
256•Levitating•13h ago•33 comments

If AI replaces workers, should it also pay taxes?

https://english.elpais.com/technology/2025-11-30/if-ai-replaces-workers-should-it-also-pay-taxes....
374•PaulHoule•16h ago•603 comments

Invader: Where to Spot the 8-Bit Street Art in London

https://londonist.com/london/art-and-photography/invader-where-to-spot-the-8-bit-street-art-in-lo...
50•zeristor•1w ago•17 comments

Optery (YC W22) Hiring CISO, Release Manager, Tech Lead (Node), Full Stack Eng

https://www.optery.com/careers/
1•beyondd•5h ago

Arborium: Tree-sitter code highlighting with Native and WASM targets

https://arborium.bearcove.eu/
178•zdw•13h ago•31 comments

Ask HN: What Are You Working On? (December 2025)

349•david927•1d ago•1131 comments

Samsung may end SATA SSD production soon

https://www.techradar.com/computing/storage-backup/looking-for-a-cheap-ssd-dont-wait-samsung-coul...
42•Krontab•2h ago•22 comments

Ask HN: Is building a calm, non-gamified learning app a mistake?

19•hussein-khalil•1h ago•30 comments

SoundCloud has banned VPN access

https://old.reddit.com/r/SoundCloudMusic/comments/1pltd19/soundcloud_just_banned_vpn_access/
180•empressplay•14h ago•131 comments

AI agents are starting to eat SaaS

https://martinalderson.com/posts/ai-agents-are-starting-to-eat-saas/
285•jnord•17h ago•285 comments

$5 whale listening hydrophone making workshop

https://exclav.es/2025/08/03/dinacon-2025-passive-acoustic-listening/
80•gsf_emergency_6•4d ago•27 comments

We Put Flock Under Surveillance: Go Make Them Behave Differently [video]

https://www.youtube.com/watch?v=W420BOqga_s
36•huvarda•2h ago•6 comments

John Varley has died

http://floggingbabel.blogspot.com/2025/12/john-varley-1947-2025.html
141•decimalenough•14h ago•54 comments

The Whole App is a Blob

https://drobinin.com/posts/the-whole-app-is-a-blob/
124•valzevul•12h ago•72 comments

The Java Ring: A Wearable Computer (1998)

https://www.nngroup.com/articles/javaring-wearable-computer/
35•cromulent•5d ago•32 comments

Common Rust Lifetime Misconceptions

https://github.com/pretzelhammer/rust-blog/blob/master/posts/common-rust-lifetime-misconceptions.md
87•CafeRacer•11h ago•41 comments

Show HN: I wrote a book – Debugging TypeScript Applications (in beta)

https://pragprog.com/titles/aodjs/debugging-typescript-applications/
44•ozornin•1w ago•17 comments

The Problem of Teaching Physics in Latin America (1963)

https://calteches.library.caltech.edu/46/2/LatinAmerica.htm
80•rramadass•20h ago•76 comments

The Google Search homepage adds a 'plus' menu

https://9to5google.com/2025/12/15/google-search-plus-menu/
3•xnx•7m ago•0 comments
Open in hackernews

Avoid UUIDv4 Primary Keys

https://andyatkinson.com/avoid-uuid-version-4-primary-keys
200•pil0u•7h ago

Comments

jwr•6h ago
"if you use PostgreSQL"

(in the scientific reporting world this would be the perennial "in mice")

hyperpape•6h ago
The thing is, none of us are mice, but many of us use Postgres.

It would be the equivalent of "if you're a middle-aged man" or "you're an American".

P.S. I think some of the considerations may be true for any system that uses B-Tree indexes, but several will be Postgres specific.

orthoxerox•6h ago
It's not just Postgres or even OLTP. For example, if you have an Iceberg table with SCD2 records, you need to regularly locate and update existing records. The more recent a record is, the more likely it is to be updated.

If you use UUIDv7, you can partition your table by the key prefix. Then the bulk of your data can be efficiently skipped when applying updates.

kijin•6h ago
The space requirement and index fragmentation issue is nearly the same no matter what kind of relational database you use. Math is math.

Just the other day I delivered significant performance gains to a client by converting ~150 million UUIDv4 PKs to good old BIGINT. They were using a fairly recent version of MariaDB.

zelphirkalt•5h ago
If they can live with making keys only in one place, then sure, this can work. If however they need something that is very highly likely unique, across machines, without the need to sync, then using a big integer is no good.

if they can live with MariaDB, OK, but I wouldn't choose that in the first place these days. Likely Postgres will also perform better in most scenarios.

kijin•5h ago
Yeah, they had relatively simple requirements so BIGINT was a quick optimization. MariaDB can guarantee uniqueness of auto-incrementing integers across a cluster of several servers, but that's about the limit.

Had the requirements been different, UUIDv7 would have worked well, too, because fragmentation is the biggest problem here.

splix•5h ago
I think the author means all dbs that fit a single server. Because in distributed dbs you often want to spread the load evenly over multiple servers.
benterix•6h ago
The article sums up some valid arguments against UUIDv4 as PKs but the solution the author provides on how to obfuscate integers is probably not something I'd use in production. UUIDv7 still seems like a reasonable compromise for small-to-medium databases.
mort96•6h ago
I tend to avoid UUIDv7 and use UUIDv4 because I don't want to leak the creation times of everything.

Now this doesn't work if you actually have enough data that the randomness of the UUIDv4 keys is a practical database performance issue, but I think you really have to think long and hard about every single use of identifiers in your application before concluding that v7 is the solution. Maybe v7 works well for some things (e.g identifiers for resources where creation times are visible to all with access to the resource) but not others (such as users or orgs which are publicly visible but without publicly visible creation times).

cdmckay•5h ago
Out of curiosity, why is it an issue if you leak creation time?
robertlagrant•5h ago
Depends on the data. If you use a primary key in data about a person that shouldn't include their age (e.g. to remove age-based discrimination) then you are leaking an imperfect proxy to their age.
lwhi•5h ago
So the UUID could be used as an imperfect indicator of a records created time?
benterix•5h ago
UUIDv7 but not UUIDv4.
lwhi•5h ago
I suppose timing attacks become an issue too.
wongarsu•3h ago
UUIDv7 still have a lot of random bits. Most attacks around creating lots of ids are foiled by that
kreetx•5h ago
E.g, if your service users have timestamp as part of the key and this data is visible to other users, you would know when that account was created. This could be an issue.
mort96•5h ago
Well you're leaking user data. I'm sure you can imagine situations where "the defendant created an account on this site on this date" could come up. And the user could have created that account not knowing that the creation date is public, because it's not listed anywhere in the publicly viewable part of the profile other than the UUID in the URL.
nish__•4h ago
Pretty much every social media app has a "Member since X" visible on public profiles. I don't think it's an issue.
mort96•4h ago
Who said I was talking about social media?
nish__•4h ago
Well where else do users have public profiles?
0x3f•4h ago
The whole point though is that the ID itself leaks info, even if the profile is not public. There are many cases where you reference an object as a foreign key, even if you can't see the entire record of that foreign key.
koakuma-chan•1h ago
Discord is doing fine.
mort96•1h ago
Hacker news is also doing fine, even though I can just click your profile and see you joined in october 2024. It doesn't matter for every use case.

But there are cases where it matters. Using UUIDv7 for identifiers means you need to carefully consider the security and privacy implications every time you create a new table identified by a UUID, and you'll possibly end up with some tables where you use v4 and some where you use v7. Worst case, you'll end up with painful migrations from v7 to v4 as security review identifies timestamped identifiers as a security concern.

bruce511•5h ago
The issue will be very context specific. In other words to (reasonably) answer the question, we'd have to judge each application individually.

For one example, say you were making voting-booth software. You really don't want a (hidden) timestamp attached to each vote (much less an incrementing id) because that would break voter confidentiality.

More generally, it's more a underlying principle of data management. Not leaking ancillary data is easier to justify than "sure we leak the date and time of the record creation, but we can't think of a reason why that matters."

Personally I think the biggest issue are "clever" programmers who treat the uuid as data and start displaying the date and time. This leads to complications ("that which is displayed, the customer wants to change"). It's only a matter of time before someone declares the date "wrong" and it must be "fixed". Not to mention time zone or daylight savings conversions.

Bombthecat•5h ago
Admins, early users, founders, CEOs etc etc would have althe lowest creation time...
saaspirant•4h ago
There was a HN comment about competitors tracking how many new signups are happening and increasing the discounts/sales push based on that. Something like this.
JetSetIlly•4h ago
In a business I once worked for, one of the users of the online ordering system represented over 50% of the business' income, something you wouldn't necessarily want them to know.

However, because the online ordering system assigned order numbers sequentially, it would have been trivial for that company to determine how important their business was.

For example, over the course of a month, they could order something at the start of the month and something at the end of the month. That would give them the total number of orders in that period. They already know how many orders they have placed during the month, so company_orders / total_orders = percentage_of_business

It doesn't even have to be accurate, just an approximation. I don't know if they figured out that they could do that but it wouldn't surprise me if they had.

0x3f•4h ago
That's happening everywhere. You can order industrial parts from a Fortune 500 and check some of the numbers on it too, if they're not careful about it.
natch•3h ago
If your system (pseudo-) random number generator (RNG) is compromised to derive a portion of its entropy from things that are knowable by knowing the time when the function ran, then the search space for cracking keys created around the same time can be shrunken considerably.

This doesn’t even rely on your system’s built-in RNG being low quality. It could be audited and known to avoid such issues but you could have a compromised compiler or OS that injects a doctored RNG.

dboreham•3h ago
Apart from all the other answers here: an external entity knowing the relative creation time for two different accounts, or just that the two accounts were created close in time to each other can represent a meaningful information leak.
nbadg•5h ago
I'm also not a huge fan of leaking server-side information; I suspect UUIDv7 could still be used in statistical analysis of the keyspace (in a similar fashion to the german tank problem for integer IDs). Also, leaking data about user activity times (from your other comment) is a *really* good point that I hadn't considered.

I've read people suggest using a UUIDv7 as the primary key and a UUIDv4 as a user-visible one as a remedy.

My first thought when reading the suggestion was, "well but you'll still need an index on the v4 IDs, so what does this actually get you?" But the answer is that it makes joins less expensive; you only require the index once, when constructing the query from the user-supplied data, and everything else operates with the better-for-performance v7 IDs.

To be clear, in a practical sense, this is a bit of a micro-optimization; as far as I understand it, this really only helps you by improving the data locality of temporally-related items. So, for example, if you had an "order items" table, containing rows of a bunch of items in an order, it would speed up retrieval times because you wouldn't need to do as many index traversals to access all of the items in a particular order. But on, say, a users table (where you're unlikely to be querying for two different users who happen to have been created at approximately the same time), it's not going to help you much. Of course the exact same critique is applicable to integer IDs in those situations.

Although, come to think of it, another advantage of a user-visible v4 with v7 Pk is that you could use a different index type on the v4 ID. Specifically, I would think that a hash index for the user-visible v4 might be a halfway-decent way to go.

I'm still not sure either way if I like the idea, but it's certainly not the craziest thing I've ever heard.

throw0101a•5h ago
> I tend to avoid UUIDv7 and use UUIDv4 because I don't want to leak the creation times of everything.

See perhaps "UUIDv47 — UUIDv7-in / UUIDv4-out (SipHash‑masked timestamp)":

* https://github.com/stateless-me/uuidv47

* Sept 2025: https://news.ycombinator.com/item?id=45275973

wongarsu•3h ago
If that kind of stuff is on the able you can also use boring 64bit integer keys and encrypt those (e.g. [1]). Which in the end is just a better thought out version of what the article author did.

UUIDv47 might have a space if you need keys generated on multiple backend servers without synchronization. But it feels very niche to me.

1: https://wiki.postgresql.org/wiki/XTEA_(crypt_64_bits)

barrkel•3h ago
You shouldn't generally use PKs as public identifiers, least of all UUIDs, which are pretty user hostile.
mort96•3h ago
I really don't see the issue with having a UUID in a URL.
formerly_proven•5h ago
If all you want is to obfuscate the fact that your social media site only has 200 users and 80 posts, simply use a permutation over the autoincrement primary key. E.g. IDEA or CAST-128, then encode in base64. If someone steps on your toes because somewhere in your codebase you're using a forbidden legacy cipher, just use AES-128. (This is sort of the degenerate/tautological base case of format-preserving encryption)

(What do you think Youtube video IDs are?)

pdimitar•4h ago
> What do you think Youtube video IDs are?

I actually haven no idea. What are they?

(Also what is the format of their `si=...` thing?)

intalentive•13m ago
Can’t recall where I heard this, but I’m pretty sure the si=… is tracking information that associates the link with the user who shared it.
benterix•4h ago
I always thought they are used and stored as they are because the kind of transformation you mention seems terribly expensive given the YT's scale, and I don't see a clear benefit of adding any kind of obfuscation here.
Retr0id•4h ago
Why not use AES-128 by default? Your CPU has instructions to accelerate AES-128.
enz•3h ago
The problem with this approach is that you now have to manage a secret key/secret for a (maybe) a very long time.

I shared this article a few weeks ago, discussing the problems with this kind of approach: https://notnotp.com/notes/do-not-encrypt-ids/

I believe it can make sense in some situations, but do you really want to implement such crypto-related complexity?

conradfr•3h ago
Can't you just change the starting value of your sequence?
xandrius•6h ago
To summarise the article: in PG, prefer using UUIDv7 over UUIDv4 as they have slightly better performance.

If you're using latest version of PG, there is a plugin for it.

That's it.

sbuttgereit•5h ago
You might have missed the big H2 section in the article:

"Recommendation: Stick with sequences, integers, and big integers"

After that then, yes, UUIDv7 over UUIDv4.

This article is a little older. PostgreSQL didn't have native support so, yeah, you needed an extension. Today, PostgreSQL 18 is released with UUIDv7 support... so the extension isn't necessary, though the extension does make the claim:

"[!NOTE] As of Postgres 18, there is a built in uuidv7() function, however it does not include all of the functionality below."

What those features are and if this extension adds more cruft in PostgreSQL 18 than value, I can't tell. But I expect that the vast majority of users just won't need it any more.

tmountain•5h ago
Sticking with sequences and other integer types will cause problems if you need to shard later.
zwnow•5h ago
Especially in larger systems, how does one solve the issue of reaching the max value of an integer in their database? Sure for unsigned bigint thats hard to achieve but regular ints? Apps quickly outgrow that.
sbuttgereit•4h ago
OK... but that concern seems a bit artificial.. if bigints are appropriate: use them. If the table won't get to bigint sizes: don't. I've even used smallint for some tables I knew were going to be very limited in size. But I wouldn't worry about smallint's very limited number of values for those tables that required a larger size for more records: I'd just use int or bigint for those other tables as appropriate. The reality is that, unless I'm doing something very specific where being worried about the number of bytes will matter... I just use bigint. Yes, I'm probably being wasteful, but in the cases where those several extra bytes per record are going to really add up.... I probably need bigint anyway and in cases where bigint isn't going to matter the extra bytes are relatively small in aggregate. The consistency of simply using one type itself has value.

And for those using ints as keys... you'd be surprised how many databases in the wild won't come close to consuming that many IDs or are for workloads where that sort of volume isn't even aspirational.

Now, to be fair, I'm usually in the UUID camp and am using UUIDv7 in my current designs. I think the parent article makes good points, but I'm after a different set of trade-offs where UUIDs are worth their overhead. Your mileage and use-cases may vary.

zwnow•3h ago
Idk I use whatever scales best and that would be an close to infinite scaling key. The performance compromise is probably zeroed out once you have to adapt ur database to a different one supporting the current scale of the product. Thats for software that has to scale. Whole different story for stuff that doesnt have to grow obviously. I am in the UUID camp too but I dont care whether its v4 or v7.
wongarsu•3h ago
It's not like there are dozens of options and you constantly have to switch. You just have to estimate if at maximum growth your table will have 32 thousand, 2 billion or 9 quintillion entries. And even if you go with 9 quintillion for all cases you still use half the space of a UUID

UUIDv4 are great for when you add sharding, and UUIDs in general prevent issues with mixing ids from different tables. But if you reach the kind of scale where you have 2 billion of anything UUIDs are probably not the best choice either

sgarland•2h ago
There are plenty of ways to deal with that. You can shard by some other identifier (though I then question your table design), you can assign ranges to each shard, etc.
hans_castorp•4h ago
With the latest Postgres version (>= 18) you do NOT need a plugin
dotancohen•6h ago
From the fine article:

  > Random values don’t have natural sorting like integers or lexicographic (dictionary) sorting like character strings. UUID v4s do have "byte ordering," but this has no useful meaning for how they’re accessed.
Might the author mean that random values are not sequential, so ordering them is inefficient? Of course random values can be ordered - and ordering by what he calls "byte ordering" is exactly how all integer ordering is done. And naive string ordering too, like we would do in the days before Unicode.
kreetx•6h ago
Using an UUIDv4 as primary key is a trade-off: you use it when you need to generate unique keys in a distributed manner. Yes, these are not datetime ordered and yes, they take 128 bits of space. If you can't live with this, then sure, you need to consider alternatives. I wonder if "Avoid UUIDv4 Primary Keys" is a rule of thumb though.
dotancohen•5h ago
If one needs timestamp ordering, then UUIDv7 is a good alternative.

But the author does not say timestamp ordering, he says ordering. I think he actually means and believes that there is some problem ordering UUIDv4.

kreetx•5h ago
Yup. There are alternatives depending on what the situation is: with non-distributed, you could just use a sufficiently sized int (which can be rather small when the table is for e.g humans). You could add a separate timestamp column if that is important.

But if you need UUID-based lookup, then you might as well have it as a primary key, as that will save you an extra index on the actual primary key. If you also need a date and the remaining bits in UUIDv7 suffice for randomness, then that is a good option too (though this does essentially amount to having a composite column made up of datetime and randomness).

torginus•5h ago
I do not understand why 128 bits is considered too big - you clearly can't have less, as on 64 bits the collision probability on real world workloads is just too high, for all but the smallest databases.

Auto-incrementing keys can work, but what happens when you run out of integers? Also, distributed dbs probably make this hard, and they can't generate a key on client.

There must be something in Postgres that wants to store the records in PK order, which while could be an okay default, I'm pretty sure you can this behavior, as this isn't great for write-heavy workloads.

vbezhenar•29m ago
You won't run out of 64-bit integer. IMO, 64-bit integer (and even less for some tables that's not expected to grow much) it the best approach for internal database ID. If you want to expose ID, it might make sense to introduce second UUID for selected tables, if you want to hide internal ID.
dagss•5h ago
The point is how closely located data you access often is. If data is roughly sorted by creation time then data you access close to one another in time is stored close to one another on disk. And typically access to data is correlated with creation time. Not for all tables but for many.

Accessing data in totally random locations can be a performance issue.

Depends on lots of things ofc but this is the concern when people talk about UUID for primary keys being an issue.

torginus•5h ago
To be polite, I don't think this article rests on sound technical foundations.
crest•5h ago
Any fixed sized bitstring has an obvious natural ordering, but since they're allocated randomly they lack the density and locality of sequential allocation.
K0nserv•5h ago
Isn't part of this that inserting into a btree index is more performant when the keys are increasing rather than being random? A random id will cause more re-balancing operations than always inserting at the end. Increasing ids are also more cache friendly
sgarland•2h ago
Yes, and for Postgres, it also causes WAL bloat due to the high likelihood of full page writes.
dev_l1x_be•4h ago
Why would you need to order by UUID? I am missing something here. Most of the time we use UUID keys for being able to create a new key without coordination and most of the time we do not want to order by primary key.
hans_castorp•4h ago
I have seen a lot of people sort by (generated) integer values to return the rows "in creation order" assuming that sorting by an integer is somehow magically faster than sorting by a proper timestamp value (which give a more robust "creation order" sorting than a generated integer value).
sgarland•2h ago
Assuming the integer value is the PK, it can in fact be much faster for MySQL / MariaDB due to InnoDB’s clustering index. If it can do a range scan over the PK, and that’s also the ORDER BY (with matching direction), congratulations, the rows are already ordered, no sort required. If it has to do a secondary index lookup to find the rows, this is not guaranteed.
sagarm•3h ago
Most common database indexes are ordered, so if you are using UUIDv4 you will not only bloat the index you will also have poor locality. If you try to use composite keys to fix locality, you'll end up with an even more bloated index.
hashmush•8m ago
Agree, I did a double take on this too.

Values of the same type can be sorted if a order is defined on the type.

It's also strange to contrast "random values" with "integers". You can generate random integers, and they have a "sorting" (depending on what that means though)

waynenilsen•6h ago
What kills me is I can’t double click the thing to select it.
mrits•4h ago
This application specific. iTerm2 doesn't break up by - why firefox does.
mcny•6h ago
Postgresql 18 released in September and has uuidv7

https://www.postgresql.org/docs/current/functions-uuid.html

dimitrisnl•5h ago
Noob question, but why no use ints for PK, and UUIDs for a public_id field?
edding4500•5h ago
*edit: sorry, misread that. My answer is not valid to your question.

original answer: because if you dont come up with these ints randomly they are sequential which can cause many unwanted situations where people can guess valid IDs and deduce things from that data. See https://en.wikipedia.org/wiki/German_tank_problem

javawizard•5h ago
Hence the presumed implication behind the public_id field in GP's comment: anywhere identifiers are exposed, you use the public_id field, thereby preventing ID guessing while still retaining the benefits of ordered IDs where internal lookups are concerned.

Edit: just saw your edit, sounds like we're on the same page!

javaunsafe2019•5h ago
So We make things hard in the backend because of leaky abstractions? Doesn't make sense imo.
jcims•5h ago
Decades of security vulnerabilities and compromises because of sequential/guessable PKs is (only!) part of the reason we're here. Miss an authorization check anywhere in the application and you're spoon-feeding entire tables to anyone with the inclination to ask for it.
grim_io•5h ago
The article mentions microservices, which can increase the likelihood of collisions in sequential incremental keys.

One more reason to stay away from microservices, if possible.

bardsore•5h ago
Always try to avoid having two services using the same DB. Only way I'd ever consider sharing a DB is if only one service will ever modify it and all others only read.
grim_io•5h ago
Good luck enforcing that :)
mrkeen•5h ago
The 'collision' is two service classes both trying to use one db.

If you separate them (i.e. microservices) the they no longer try to use one db.

grim_io•5h ago
There is nothing stopping multiple microservices from using the same DB, so of course this will happen in practice.

Sometimes it might even be for a good reason.

alerighi•5h ago
If you put an index on the UUID field (because you have an API where you can retrieve objects with UUID) you have kind of the same problem, at least in Postgres where a primary key index or a secondary index are more or less the same (to the point is perfectly valid in pgsql to not have any primary key defined for the table, because storage on disk is done trough an internal ID and the indexes, being primary or not, just reference to the rowId in memory). Plus the waste of space of having 2 indexes for the same table.

Of course this is not always the case that is bad, for example if you have a lot of relations you can have only one table where you have the UUID field (and thus expensive index), and then the relations could use the more efficient int key for relations (for example you have an user entity with both int and uuid keys, and user attribute references the user with the int key, of course at the expense of a join if you need to retrieve one user attribute when retrieving the user is not needed).

torginus•5h ago
You can create hash indexes in Postgres, so the secondary index uuid seems workable:

https://www.postgresql.org/docs/current/hash-index.html

dsego•4h ago
I also think we can use a combination of a PID - persistent ID (I always thought it was public) and an auto-increment integer ID. Having a unique key helps when migrating data between systems or referencing a piece of data in a different system. Also, using serial IDs in URLs and APIs can reveal sensitive information, e.g. how many items there are in the database.
Lucasoato•5h ago
Hi, a question for you folks. What if I don’t like to embed timestamp in uuid as v7 do? This could expose to timing attacks in specific scenarios.

Also is it necessary to show uuid at all to customers of an API? Or could it be a valid pattern to hide all the querying complexity behind named identifiers, even if it could cost a bit in terms of joining and indexing?

The context is the classic B2B SaaS, but feel free to share your experiences even if it comes from other scenarios!

lwhi•5h ago
Wouldn't you need to expose UUID if you want to make use of optimistic locking?
Lucasoato•5h ago
I feel that this is among the good reasons to keep exposing UUID in the API.
scary-size•5h ago
This reminds me about this old gist for generating Firebase-like "push IDs" [1]. Those have some nicer properties.

[1] https://gist.github.com/mikelehen/3596a30bd69384624c11

socketcluster•5h ago
My advice is: Avoid Blanket Statements About Any Technology.

I'm tired of midwit arguments like "Tech X is N% faster than tech Y at performing operation Z. Since your system (sometimes) performs operation Z, it implies that Tech X is the only logical choice in all situations!"

It's an infuriatingly silly argument because operation Z may only represent about 10% of the total CPU usage of the whole system (averaged out)... So what is promoted as a 50% gain may in fact be a 5% gain when you consider it in the grand scheme of things... Negligible. If everyone was looking at this performance 'advantage' rationally; nobody would think it's worth sacrificing important security or operational properties.

I don't know what happened to our industry; we're supposed to be intelligent people but I see developers falling for these obvious logical fallacies over and over.

I remember back in my day, one of the senior engineers was discussing upgrading a python system and stated openly that the new version of the engine was something like 40% slower than the old version but he didn't even have to explain himself why upgrading was still a good decision; everybody in the company knew he was only talking about the code execution speed and everybody knew that this was a small fraction of the total.

Not saying UUIDv7 was a bad choice for Postgres. I'm sure it's fine for a lot of situations but you don't have to start a cult preaching the gospel of The One True UUID to justify your favorite project's decisions.

I do find it kind of sly though how the community decided to make this UUIDv7 instead of creating a new standard for it.

The whole point of UUID was to leverage the properties of randomness to generate unique IDs without requiring coordination. UUIDv7 seems to take things in a philosophically different path. People chose UUID for scalability and simplicity (both of which you get as a result of doing away with the coordination overhead), not for raw performance...

That's the other thing which drives me nuts; people who don't understand the difference between performance and scalability. People foolishly equate scalability with parallelism or concurrency; whereas that's just one aspect of it; scalability is a much broader topic. It's the difference between a theoretical system which is fast given a certain artificially small input size and one which actually performs better as the input size grows.

Lastly; no mention is made about the complex logic which has to take place behind the scenes to generate UUIDv7 IDs... People take it for granted that all computers have a clock which can produce accurate timestamps where all computers in the world are magically in-sync... UUIDv7 is not simple; it's very complicated. It has a lot of additional complexity and dependencies compared to UUIDv4. Just because that complexity is very well hidden from most developers, doesn't mean it's not there and that it's not a dependency... This may become especially obvious as we move to a world of robotics and embedded systems where cheap microchips may not have enough Flash memory to hold the code for the kinds of programs required to compute such elaborate IDs.

kunley•5h ago
Wasn't choosing uuids as ids falling for the deceptive argument in the first place?
christophilus•5h ago
Not really, no. They’re very convenient for certain problems and work really well in general. I’ve never had a performance issue where the problem boiled down to my use of UUID.
kunley•5h ago
What are these certain problems, if I may ask?
socketcluster•4h ago
A major one for me is preventing duplicate records.

If the client POSTs a new object to insert it into the database; if there is a connection failure and the client does not receive a success response from the server, the client cannot know whether the record was inserted or not without making an expensive and cumbersome additional read call to check... The client cannot simply assume that the insertion did not happen purely on the basis that they did not receive a success response. It could very well be that the insertion succeeded but the connection failed shortly after so response was not received. If the IDs are auto-incremented on the server and the client posts the same object again without any ID on it, the server will create a duplicate record in the database table (same object with a different ID).

On the other hand, if the client generates a UUID for the object it wants to create on the front-end, then it can safely resend that exact object any number of times and there is no risk of double-insertion; the object will be rejected the second time and you can show the user a meaningful error "Record was already created" instead of creating two of the same resource; leading to potential bugs and confusion.

mkleczek•3h ago
Preferably, you would design you APIs and services to be idempotent (ie. use PUT not POST etc.)

Using idempotency identifier is the last resort in my book.

socketcluster•3h ago
Still, UUID is probably the simplest and most reliable way to generate such idempotency identifiers.
kunley•3h ago
Ehm.. so you're saying that INSERT ... RETURNING id is not atomic from the client's pov because something terrible could happen just when client is receiving the answer inside its SQL driver?
socketcluster•3h ago
I'm actually more thinking about the client sitting on the front-end like a single page app. Network instability could cause the response to not reach the front-end after a successful insert. This wouldn't be extremely common but would definitely be a problem for you as the database admin if you have above a certain number of users. I've seen this issue on live production systems and the root cause of duplicate records can be baffling because of how infrequently it may happen. Tends to cause issues that are hard to debug.
danparsonson•4h ago
You never having seen the problem doesn't mean it never happens; I have dealt with a serious performance problem in the past that was due to excessive page fragmentation due to a GUID PK.

To your original point, these are heuristics; there isn't always time to dig into every little architectural decision, so having a set of rules of thumb on hand helps to preempt problems at minimal cognitive cost. "Avoid using a GUID as a primary key if you can" is one of mine.

thraxil•3h ago
Yep. We have tables that use UUIDv4 that have 60M+ rows and don't have any performance problems with them. Would some queries be faster using something else? Probably, but again, for us it's not close to being a bottleneck. If it becomes a problem at 600M or 6B rows, we'll deal with it then. We'll probably switch to UUIDv7 at some point, but it's not a priority and we'll do some tests on our data first. Does my experience mean you should use UUIDv4? No. Understand your own system and evaluate how the tradeoffs apply to you.
p2detar•16m ago
Nice feedback. Out of curiosity, have you made any fine-tuning to psql that greatly improved performance?
dfox•5h ago
> Creating obfuscated values using integers

While that is often neat solution, do not do that by simply XORing the numbers with constant. Use a block cipher in ECB mode (If you want the ID to be short then something like NSA's Speck comes handy here as it can be instantiated with 32 or 48 bit block).

And do not even think about using RC4 for that (I've seen that multiple times), because that is completely equivalent to XORing with constant.

bux93•5h ago
Long article about why not to use UUIDv4 as Primary Keys, but.. Who is doing so? And why are they doing that? How would you solve their requirements? Just throwing out "you can use UUIDv7" doesn't help with, e.g., the size they take up.

Aren't people using (big)ints are primary keys, and using UUIDs as logical keys for import/export, solving portability across different machines?

Sayrus•5h ago
UUIDs are usually the go-to solution to enumeration problems. The space is large enough that an attacker cannot guess how many X you have (invoices, users, accounts, organizations, ...). When people replace the ints by UUIDv4, they keep them as primary keys.
bruce511•5h ago
I'd add that it's also used when data is created in multiple places.

Consider say weather hardware. 5 stations all feeding into a central database. They're all creating rows and uploading them. Using sequential integers for that is unnecessarily complex (if even possible.)

Given the amount of data created on phones and tablets, this affects more situations than first assumed.

It's also very helpful in export / edit / update situations. If I export a subset of the data (let's say to Excel), the user can edit all the other columns and I can safely import the result. With integer they might change the ID field (which would be bad). With uuid they can change it, but I can ignore that row (or the whole file) because what they changed it to will be invalid.

nrhrjrjrjtntbt•5h ago
Yes and the DB might be columnular or a distributed KV, sidestepping the index problem.
cebert•5h ago
Using UUIDs as primary keys in non-relational databases like DynamoDB is valid and doesn’t raise the concerns mentioned in the article.
andatki•3h ago
Good point that the post should be made clear it’s referring only to my experience with Postgres.
reactordev•5h ago
I fun trick I did was generate UUID-like ids. We all can identify a UUIDv4 most of the time by looking at one. "Ah, a uuid" we say to ourselves. A little over a decade ago I was working on a massive cloud platform and rather than generate string keys like the author above suggested (int -> binary -> base62 str) we opted for a more "clever" approach.

The UUID is 128bits. The first 64bits are a java long. The last 64bits are a java long. Let's just combine the Tenant ID long with a Resource ID long to generate a unique id for this on our platform. (worked until it didn't).

nrhrjrjrjtntbt•5h ago
Atlassian settles for longer "ARIs" for this (e.g.https://developer.atlassian.com/cloud/guard-detect/developer...) composed of guids which allow a scheme like the Amazon ARN to pass around.
reactordev•2h ago
yeah, the problem for us was the resource id. What id was it? Was it a post? an upload? a workspace? it wasn't nearly as descriptive as we needed it to be.
K0nserv•5h ago
An additional thing I learned when I worked on a ulid alternative over the weekend[0] is: Postgres's internal Datum type is at most 64 bits which means every uuid requires heap allocation[1] (at least until we get 128 bit machines).

0: https://bsky.app/profile/hugotunius.se/post/3m7wvfokrus2g

1: https://github.com/postgres/postgres/blob/master/src/backend...

raxxorraxor•5h ago
> Do not assume that UUIDs are hard to guess; they should not be used as security capabilities

The issue is that is true for more or less all capability URLs. I wouldn't recommend UUIDs per se here, probably better to just use a random number. I have seen UUIDs for this in practice though and these systems weren't compromised because of that.

I hate the tendency that password recovery flows for example leave the URL valid for 5 minutes. Of course these URLs need to have a limited life time, but mail isn't a real time communication medium. There is very little security benefit from reducing it from 30 minutes to 5 minutes for example. You are not getting "securer" this way.

vintermann•5h ago
A prime example of premature optimization.

Permanent identifiers should not carry data. This is like the cardinal sin of data management. You always run into situations where the thing you thought, "surely this never changes, so it's safe to squeeze into the ID to save a lookup". Then people suddenly find out they have a new gender identity, and they need a last final digit in their ID numbers too.

Even if nothing changes, you can run into trouble. Norwegian PNs have your birth date (in DDMMYY format) as the first six digits. Surely that doesn't change, right? Well, wrong, since although the date doesn't change, your knowledge of it might. Immigrants who didn't know their exact date of birth got assigned 1. Jan by default... And then people with actual birthdays on 1 Jan got told, "sorry, you can't have that as birth date, we've run out of numbers in that series!"

Librarians in the analog age can be forgiven for cramming data into their identifiers, to save a lookup. When the lookup is in a physical card catalog, that's somewhat understandable (although you bet they could run into trouble over it too). But when you have a powerful database at your fingertips, use it! Don't make decisions you will regret just to shave off a couple of milliseconds!

oncallthrow•4h ago
It sounds to me like you’re just arguing for premature optimization of another kind (specifically, prematurely changing your entire architecture for edge cases that probably won’t ever happen to you).
vintermann•4h ago
If you have an architecture already, obviously it's hard to change and you may want to postpone it until those edge cases which probably won't ever happen to you, happen. But for new architectures, value your own grey hairs over small performance improvements.
tacone•4h ago
Fantastic real life example. Italian PNs carry also the gender, which something you can change surgically, and you'll eventually run into the issue when operating at scale.

I don't agree with the absolute statement, though. Permanent identifiers should not generally carry data. There are situations where you want to have a way to reconciliate, you have space or speed constraints, so you may accept the trade off, md5 your data and store it in a primary index as a UUID. Your index will fragment and thus you will vacuum, but life will still be good overall.

mckirk•4h ago
I'm not sure whether that was intended, but 'operating at scale' actually made me laugh out loud :D
cozyman•42m ago
how does one change their gender surgically?
mkleczek•4h ago
This is actually a very deep and interesting topic. Stripping information from an identifier disconnects a piece of data from the real world which means we no longer can match them. But such connection is the sole purpose of keeping the data in the first place. So, what happens next is that the real world tries to adjust and the "data-less" identifier becomes a real world artifact. The situation becomes the same but worse (eg. you don't exist if you don't remember your social security id). In extreme cases people are tattooed with their numbers.

The solution is not to come up with yet another artificial identifier but to come up with better means of identification taking into account the fact that things change.

vrighter•3h ago
You can't take into account the fact that things change when you don't know what those changes might be. You might end up needing to either rebuild a new database, have some painful migration, or support two codepaths to work with both types of keys.
mkleczek•12m ago
Network protocol designers know better and by default embed protocol version number in message format spec.

I guess you can assign 3-4 bits for identifier version number as well.

And yes - for long living data dealing with compatibility issues is inevitable so you have to take that into account from the very beginning.

Ukv•1h ago
> Stripping information from an identifier disconnects a piece of data from the real world which means we no longer can match them. But such connection is the sole purpose of keeping the data in the first place.

The identifier is still connected to the user's data, just through the appropriate other fields in the table as opposed to embedded into the identifier itself.

> So, what happens next is that the real world tries to adjust and the "data-less" identifier becomes a real world artifact. The situation becomes the same but worse (eg. you don't exist if you don't remember your social security id). In extreme cases people are tattooed with their numbers.

Using a random UUID as primary key does not mean users have to memorize that UUID. In fact in most cases I don't think there's much reason for it to even be exposed to the user at all.

You can still look up their data from their current email or phone number, for instance. Indexes are not limited to the primary key.

> The solution is not to come up with yet another artificial identifier but to come up with better means of identification taking into account the fact that things change.

A fully random primary key takes into account that things change - since it's not embedding any real-world information. That said I also don't think there's much issue with embedding creation time in the UUID for performance reasons, as the article is suggesting.

mkleczek•8m ago
> Using a random UUID as primary key does not mean users have to memorize that UUID. In fact in most cases I don't think there's much reason for it to even be exposed to the user at all.

So what is such an identifier for? Is it only for some technical purposes (like replication etc.)?

Why bother with UUID at all then for internal identifiers? Sequence number should be enough.

everforward•1h ago
> The solution is not to come up with yet another artificial identifier but to come up with better means of identification taking into account the fact that things change.

I think artificial and data-less identifiers are the better means of identification that takes into account that things change. They don't have to be the identifier you present to the world, but having them is very useful.

E.g. phone numbers are semi-common identifiers now, but phone numbers change owners for reasons outside of your control. If you use them as an internal identifier, changing them between accounts gets very messy because now you don't have an identifier for the person who used to have that phone number.

It's much cleaner and easier to adapt if each person gets an internal context-less identifier and you use their phone number to convert from their external ID/phone number to an internal ID. The old account still has an identifier, there's just no external identifier that translates to it. Likewise if you have to change your identifier scheme, you can have multiple external IDs that translate to the same internal ID (i.e. you can resolve both their old ID and their new ID to the same internal ID without insanity in the schema).

hyperpape•4h ago
Your comment is sufficiently generic that it’s impossible to tell what specific part of the article you’re agreeing with, disagreeing with, or expanding upon.
vintermann•4h ago
I disagree that performance should be a reason to choose running numbers over guids until you absolutely have to.

I think IDs should not carry information. Yes, that also means I think UUIDv7 was wrong to squeeze a creation date into their ID.

Isn't that clear enough?

mcny•3h ago
That's the creation date of that guid though. It doesn't say anything about the entity in question. For example, you might be born in 1987 and yet only get a social security number in 2007 for whatever reason.

So, the fact that there is a date in the uuidv7 does not extend any meaning or significance to the record outside of the database. To infer such a relationship where none exists is the error.

vintermann•3h ago
You can argue that, but then what is its purpose? Why should anyone care about the creation date of a by-design completely arbitrary thing?

I bet people will extract that date and use it, and it's hard to imagine use which wouldn't be abuse. To take the example of a PN/SSN and the usual gender bit: do you really want anyone to be able to tell that you got a new ID at that time? What could you suspect if a person born in 1987 got a new PN/SSN around 2022?

Leaks like that, bypassing whatever access control you have in your database, is just one reason to use real random IDs. But it's even a pretty good one in itself.

mcny•3h ago
> What could you suspect if a person born in 1987 got a new PN/SSN around 2022?

Thank you for spelling it for me. For the readers, It leaks information that the person is likely not a natural born citizen. The assumption doesn't have to be a hundred percent accurate, There is a way to make that assumption And possibly hold it against you.

And there are probably a million ways that a record created date could be held against you If they don't put it in writing, how will you prove They discriminated against you.

Thinking... I don't have a good answer to this. If data exists, people will extract meaning from it whether rightly or not.

infogulch•2h ago
To quote the great Mr Sparrow:

> The only rules that really matter are these: what a man can do and what a man can't do.

When evaluating security matters, it's better to strip off the moral valence entirely ("rightly") and only consider what is possible given the data available.

Another potential concerning implication besides citizenship status: a person changed their id when put in a witness protection program.

anamexis•3h ago
I would argue that is one of very few situations where leaking the timestamp that the ID was created when you already have the ID is a possible concern at all.

And when working with very large datasets, there are very significant downsides to large, completely random IDs (which is of course what the OP is about).

majorchord•2h ago
> You can argue that, but then what is its purpose? Why should anyone care about the creation date of a by-design completely arbitrary thing?

Pretty sure sorting and filtering them by date/time range in a database is the purpose.

miroljub•1h ago
If you need sorting and filtering by date, just add a timestamp to your table instead of misusing an Id column for that.
naasking•43m ago
Exactly, be explicit, don't shoehorn multiple purposes into a single column that's supposed to be a largely meaningless unique identifier.
mixmastamyk•33m ago
That happens, in general. The benefit comes when it’s time to look up by uuid only; the prefix is an index to its disk block location.
majorchord•31m ago
> just

It is easy to have strong opinions about things you are sheltered from the consequences of.

barrkel•3h ago
UUID v7 doesn't squeeze creation date in. If you treat it as anything other than a random sequence in your applications, you're just wrong.
zamadatix•3h ago
"What it does" and "what I think you should do with it" should not be treated as equivalent statements.
anamexis•3h ago
For what it’s worth, it was also completely unclear to me how you were responding to the article itself. It does not discuss natural keys at all.
hyperpape•3h ago
Those are two unrelated points and the connection between them was unclear in the original post.
hnfong•4h ago
The curious thing about the article is that, it's definitely premature optimization for smaller databases, but when the database gets to the scale where these optimizations start to matter, you actually don't want to do what they suggest.

Specifically, if your database is small, the performance impact is probably not very noticeable. And if your database is large (eg. to the extent primary keys can't fit within 32-bit int), then you're actually going to have to think about sharding and making the system more distributed... and that's where UUID works better than auto-incrementing ints.

oblio•4h ago
> Well, wrong, since although the date doesn't change.

Someone should have told Julius Caesar and Gregory XIII that :-p

barrkel•3h ago
Uuid v7 just has a bias in its generation; it isn't carrying information. You're not going to try and extract a timestamp from a uuid.

Random vs time biased uuids are not a decision to shave off ms that you will regret.

Most likely they will be a decision that shaves off seconds (yes, really - especially when you consider locality effects) and you'll regret nothing.

bri3d•3h ago
> You're not going to try and extract a timestamp from a uuid.

What? The first 48 bits of an UUID7 are a UNIX timestamp.

Whether or not this is a meaningful problem or a benefit to any particular use of UUIDs requires thinking about it; in some cases it’s not to be taken lightly and in others it doesn’t matter at all.

I see what you’re getting at, that ignoring the timestamp aspect makes them “just better UUIDs,” but this ignores security implications and the temptation to partition by high bits (timestamp).

nine_k•35m ago
Nobody forces you to use a real Unix timestamp. BTW the original Unix timestamp is 32 bits (expiring in 2038), and now everyone is switching to 64-bit time_t. What 48 bits?

All you need is a guaranteed non-decreasing 48-bit number. A clock is one way to generate it, but I don't see why a UUIDv7 would become invalid if your clock is biased, runs too fast, too slow, or whatever. I would not count on the first 48 bits being a "real" timestamp.

tobyhinloopen•1h ago
> You're not going to try and extract a timestamp from a uuid.

I totally used uuidv7s as "inserted at" in a small project and I had methods to find records created between two timestamps that literally converted timestamps to uuidv7 values so I could do "WHERE id BETWEEN a AND b"

jandrewrogers•55m ago
> You're not going to try and extract a timestamp from a uuid.

Hyrum's Law suggests that someone will.

sgarland•3h ago
> Permanent identifiers should not carry data.

Did you read the article? He doesn’t recommend natural keys, he recommends integer-based surrogates.

> A prime example of premature optimization.

Disagree. Data is sticky, and PKs especially so. Moreover, if you’re going to spend time optimizing anything early on, it should be your data model.

> Don't make decisions you will regret just to shave off a couple of milliseconds!

A bad PK in some databases (InnoDB engine, SQL Server if clustered) can cause query times to go from sub-msec to tens of msec quite easily, especially with cloud solutions where storage isn’t node-local. I don’t just mean a UUID; a BIGINT PK on a 1:M can destroy your latency for the simple reason of needing to fetch a separate page for every record. If instead the PK is a composite of (<linked_id>, id) - e.g. (user_id, id) - where id is a monotonic integer, you’ll have WAY better data locality.

Postgres suffers a different but similar problem with its visibility map lookups.

lukeschlather•29m ago
> Did you read the article? He doesn’t recommend natural keys, he recommends integer-based surrogates.

I am not a cryptographer, but I would want his recommendation reviewed by a cryptographer. And then I would have to implement it. UUIDs have been extensively reviewed by cryptographers, I have a variety of excellent implementations I can use, I know they solve the problem well. I know they can cause performance issues; they're a security feature that is easy to implement, and I can deal with the performance issues if and when they crop up. (Which, in my experience, it's unusual. Even at a large company, most databases I encounter do not have enough data. I will err on the side of security until it becomes a problem, which is a good problem to have.)

scottlamb•56m ago
> Permanent identifiers should not carry data.

I think you're attacking a straw man. The article doesn't say "instead of UUIDv4 primary keys, use keys such as birthdays with exposed semantic meaning". On the contrary, they have a section about how to use sequence numbers internally but obfuscated keys externally. (Although I agree with dfox's and formerly_proven's comments [1, 2] that XOR method they proposed for this is terrible. Reuse of a one-time pad is probably the most basic textbook example of bad cryptography. They referred to the values as "obfuscated" so they probably know this. They should have just gone with a better method instead.)

[1] https://news.ycombinator.com/item?id=46272985

[2] https://news.ycombinator.com/item?id=46273325

naasking•39m ago
I don't think the objection is that it exposes semantic meaning, but that any meaningful information is contained within the key at all, eg. even a UUID that includes timestamp information about when it was generated is "bad" in a sense, as it leaks information. Unique identifiers should be opaque and inherently meaningless.
benterix•52m ago
Your comment is valid but is not related to the article.
spoiler•22m ago
More broadly, this is the ages old surrogate vs natural key discussion, but yes the comment completely misses the point of the article. I can only assume they didn't read it in full!
vintermann•14m ago
The article explicitly argues against the use of GUIDs as primary keys, and I'm arguing for it.

A running number also carries data. Before you know it, someone's relying on the ordering or counting on there not being gaps - or counting the gaps to figure out something they shouldn't.

ivan_gammel•5h ago
The is article is about a solution in search of a problem, a classic premature optimization issue. UUIDv4 is perfectly fine for many use cases, including small databases. Performance argument must be considered when there’s a problem with performance on the horizon. Other considerations may be and very often superior to that.
sagarm•3h ago
It's not really feasible to rekey your UUIDv4 keyed database to int64s after the fact, imo. Sure your new tables could be integer-keyed, but the bulk of your storage will be UUID (and UUIDv4, if that's what you started with) for a very long time
ivan_gammel•3h ago
Yes, sure. My point is, it may never be necessary.
sagarm•17m ago
I think you're right that it won't matter for most companies. But having been at a company with persistent DB performance issues with UUIDv4 keys as a contributing factor, it sucks.
sgarland•2h ago
IME, when performance issues become obvious, the devs are in growth mode and have no desire / time to revisit PK choice.

Integer PKs were seen as fine for years - decades, even - before the rise of UUIDs.

BartjeD•4h ago
Personally my approach has been to start with big-ints and add a GUID code field if it becomes necessary. And then provide imports where you can match objects based on their code, if you ever need to import/export between tenants, with complex object relationships.

But that also adds complexity.

parpfish•2h ago
Two things I don’t like about big-int indexes:

- If you use uuids as foreign keys to another table, it’s obvious when you screw up a join condition by specifying the wrong indices. With int indices you can easily get plausible looking results because your join will still return a bunch of data

- if you’re debugging and need to search logs, having a simple uuid string is nice for searching

hk1337•4h ago
My biggest thing for UUIDs is don’t UUID everything. Most things should be okay with just regular integers as PKs.
grugdev42•4h ago
A much simpler solution is to keep your tables as they are (with an integer primary key), but add a non sequential public identifier too.

id => 123, public_id => 202cb962ac59075b964b07152d234b70

There are many ways to generate the public_id. A simple MD5 with a salt works quite well for extremely low effort.

Add a unique constraint on that column (which also indexes it), and you'll be safe and performant for hundreds of millions of rows!

Why do we developers like to overcomplicate things? ;)

Denvercoder9•4h ago
This misses the point. The reason not to use UUIDv4 is that having an index on random values is slow(er), because sequential inserts into the underlying B-tree are faster than random inserts. You're hitting the same problem with your `public_id` column, that it's not the primary key doesn't change that.
hnfong•4h ago
Ints as pk would be quicker for joins etc though.
sgarland•2h ago
For InnoDB-based DBs that are not Aurora, and if the secondary index isn’t UNIQUE, it solves the problem, because secondary non-unique index changes are buffered and written in batches to amortize the random cost. If you’re hashing a guaranteed unique entity, I’d argue you can skip the unique constraint on this index.

For Aurora MySQL, it just makes it worse either way, since there’s no change buffer.

Retr0id•4h ago
Per https://news.ycombinator.com/item?id=46273325, if you use a block cipher rather than a hash then you don't even need to store it anywhere.
kaladin_1•4h ago
I really hoped the author would discuss alternatives for distributed databases that writes in parallel. Sequential key would be atrocious in such circumstance this could kill the whole gain of distributed database as hotspots would inevitably appear.

I would like to hear from others using, for example, Google Spanner, do you have issues with UUID. I don't for now, most optimizations happen at the Controller level, data transformation can be slow due to validations. Try to keep service logic as straightforward as possible.

old8man•4h ago
Very useful article, thank you! Many people suggest CUID2, but it is less efficient and is better used for frontend/url encoding. For backend/db, only UUID v7 should be used.
mexicocitinluez•4h ago
You'll have to rip the ability to generate unique numbers from quite literally anywhere in my app and save them without conflict from my cold, dead hands.

The ability to know ahead of time what a primary key will be (in lieu of persisting it first, then returning) opened up a whole new world of architecting work in my app. It made a lot of previous awkward things feel natural.

sgarland•2h ago
Sounds like a lot of referential integrity violations.
mkleczek•3h ago
That's really an important deficiency of Postgres.

Hash index is ideally suited for UUIDs but for some reason Postgres hash indexes cannot be unique.

p2detar•3h ago
Another interesting article from Feb-2024 [0] where the cost of inserting a uuid7() and a bigint is basically the same. To me it wasn't quite clear what the problem with the buffer cache is but the author makes it much more clear than OP's article:

> We need to read blocks from the disk when they are not in the PostgreSQL buffer cache. Conveniently, PostgreSQL makes it very easy to inspect the contents of the buffer cache. This is where the big difference between uuidv4 and uuidv7 becomes clear. Because of the lack of data locality in uuidv4 data, the primary key index is consuming a huge amount of the buffer cache in order to support new data being inserted – and this cache space is no longer available for other indexes and tables, and this significantly slows down the entire workload.

0 - https://ardentperf.com/2024/02/03/uuid-benchmark-war

stevefan1999•3h ago
Try Snowflake ID then: https://en.wikipedia.org/wiki/Snowflake_ID
gwbas1c•2h ago
I work on an application where we encrypt the integer primary key and then use the bytes to generate something that looks like a UUID.

In our case, we don't want database IDs in an API and in URLs. When IDs are sequential, it enables things like dictionary attacks and provides estimates about how many customers we have.

Encrypting a database ID makes it very obvious when someone is trying to scan, because the UUID won't decrypt. We don't even need a database round trip.

mgoetzke•2h ago
That depends a lot on many factors and thus I dont like generic statements like that which tend to be more focused on a specific database pattern. That said everyone should indeed be aware of the potential tradeoffs.

And of course we could come up with many ways to generate our own ids and make them unique, but we have the following requirements.

- It needs to be a string (because we allow composing them to 'derive' keys) - A client must be able to create them (not just a server) without risk for collisions - The time order of keys must not be guessable easily (as the id is often leaked via references which could 'betray' not just the existence of a document, but also its relative creation time wrt others). - It should be easy to document how any client can safely generate document ids.

The lookup performance is not really such a big deal for us. Where it is we can do a projection into a more simple format where applicable.

aynyc•2h ago
I've seen this type of advice a few times now. Now I'm not a database expert by any stretch of imagination, but I have yet to see UUID as primary key in any of the systems I've touched.

Are there valid reasons to use UUID (assuming correctly) for primary key? I know systems have incorrectly expose primary key to the public, but assuming that's not the concern. Why use UUID over big-int?

reffaelwallen•1h ago
At my company we only use UUIDs as PKs.

Main reason I use it is the German Tank problem: https://en.wikipedia.org/wiki/German_tank_problem

(tl;dr; prevent someone from counting how many records you have in that table)

jakeydus•23m ago
I'm new to the security side of things; I can understand that leaking any information about the backend is no bueno, but why specifically is table size an issue?
ellisv•1h ago
About 10 years ago I remember seeing a number of posts saying "don't use int for ids!". Typically the reasons were things like "the id exposes the number of things in the database" and "if you have bad security then users can increment/decrement the id to get more data!". What I then observed was a bunch of developers rushing to use UUIDs for everything.

UUIDv7 looks really promising but I'm not likely to redo all of our tables to use it.

assbuttbuttass•1h ago
At least for the Spanner DB, it's good to have a randomly-distributed primary key since it allows better sharding of the data and avoids "hot shards" when doing a lot of inserts. UUIDv4 is the typical solution, although a bit-reversed incrementing integer would work too

https://cloud.google.com/blog/products/databases/announcing-...

rcxdude•1h ago
Uuids also allow the generation of the ID to seperate from the insertion into the database, which can be useful in distributed systems.
cruffle_duffle•1h ago
I mean this is the primary reason right here! You can pre-create an entire tree of relationships client side and ship it off to the database with everything all nice and linked up. And since by design each PK is globally unique you’ll never need to worry about constraint violations. It’s pretty damn nice.
nesarkvechnep•2h ago
If we embraced REST, as Roy Fielding envisioned it, we wouldn't have this, and all similar, conversations. REST doesn't expose identifier, it only exposes relationships. Identifiers are an implementation details.
stickfigure•1h ago
This is incredibly database-specific. In Postgres random PKs are bad. But in distributed databases like Cockroach, Google Cloud Datastore, and Spanner it is the opposite - monotonic PKs are bad. You want to distribute load across the keyspace so you avoid hot shards.
jakeydus•31m ago
I think they address this in the article when they say that this advice is specific to monolithic applications, but I may be misremembering (I skimmed).
andy_ppp•26m ago
Are you saying a monolith cannot use a distributed database?
jakeydus•22m ago
I'm not making any claims at all, I was just adding context from my recollection of the article that appeared to be missing from the conversation.

Edit: What the article said: > The kinds of web applications I’m thinking of with this post are monolithic web apps, with Postgres as their primary OLTP database.

So you are correct that this does not disqualify distributed databases.

deathanatos•1h ago
> Are UUIDs secure?

> Misconceptions: UUIDs are secure

> One misconception about UUIDs is that they’re secure. However, the RFC describes that they shouldn’t be considered secure “capabilities.”

> From RFC 41221 Section 6 Security Considerations:

> Do not assume that UUIDs are hard to guess; they should not be used as security capabilities

This is just wrong, and the citation doesn't support it. You're not guessing a 122-bit long random identifier. What's crazy is that the article, immediately prior to this, even cites the very math involved in showing exactly how unguessable that is.

… the linked citation (to §4.4, which is different from the in-prose citation) is just about how to generate a v4, and completely unrelated to the claim. The prose citation to §6 is about UUIDs generally: the statement "Do not assume that [all] UUIDs are hard to guess" is not logically inconsistent with properly-generated UUIDv4s being hard to guess. A subset of UUIDs have security properties, if the system generating & using them implements those properties, but we should not assume all UUIDs have that property.

Moreover, replacing an unguessable UUID with an (effectively random) 32-bit integer does make it guessable, and the scheme laid out seems completely insecure if it is to be used in the contexts one finds UUIDv4s being an unguessable identifier.

The additional size argument is pretty weak too; at "millions of rows", a UUID column is consuming an additional ~24 MiB.

hippo22•46m ago
The author should include benchmarks otherwise, saying that UUIDs “increase latency” is meaningless. For instance, how much longer does it take to insert a UUID vs. an integer? How much longer does scanning an index take?
jakeydus•29m ago
The author doesn't reference any tests that they themselves ran, but they did link a cybertec article [0] with some benchmarks.

[0] https://www.cybertec-postgresql.com/en/unexpected-downsides-...

henning•41m ago
UUIDs make enumeration attacks harder and also prevent situations where seeing a high valid ID value lets you estimate how much money a private company is earning if they charge based on the object the ID is associated with. If you can sample enough object ID values and see when the IDs were created, you could reverse engineer their ARR chart and see whether they're growing or not which many companies want to avoid.
sneak•36m ago
This is why ULID exists and why I use them in my ext_id columns. For the actual relational IDs internal to the db I use smaller/faster data types.