The "naturally sortable" is a good thing for postgres and for most people who want to use UUID, because there is no sorted distribution buckets where the last bucket always grows when inserting.
I want to see something like HBase or S3 paths when UUIDv7 gets used.
It's no worse for privacy than other UUID variants if the "privacy" you're worried about leaking is the creation time of the UUID.
As for range partitioning, you can of course choose to partition on the hash of the UUIDv7 at the cost of giving up cheaper rights / faster indices. On the other hand, that of course gives up locality which is a common challenge of partitioning schemes. It depends on the end-to-end design of the system but I wouldn't say that UUIDv7 is inherently good or bad or better/worse than other UUID schemes.
> What can go wrong with using UUIDv7 Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs. The main issue is that UUIDv7 incorporates a 48-bit Unix timestamp as its most significant part, meaning the identifier itself leaks the record's creation time.
> This leakage is primarily a privacy concern. Attackers can use the timing data as metadata for de-anonymization or account correlation, potentially revealing activity patterns or growth rates within an organization. While UUIDv7 still contains random data, relying on the primary key for security is considered a flawed approach. Experts recommend using UUIDv7 only for internal keys and exposing a separate, truly random UUIDv4 as an external identifier.
So then what's the point? How I always did things in the past was use an auto increment big int as the internal primary key, and then use a separate random UUID for the external facing key. I think this recommendation from "experts" is pretty dumb because you get very little benefit using UUIDV7 (beyond some portability improvements) if you're still using a separate internal key.
While I wouldn't use UUIDV7 as a secure token like I would UUIDV4, I don't see anything wrong with using UUIDV7 as externally exposed object keys - you're still going to need permissions checks anyway.
Or where, for some reason, the ID needs to be created before being inserted into the database. Like you're inserting into multiple services at once.
What experts? For what scenarios specifically? When do they consider time-of-creation to be sensitive?
So this basically defeats the entire performance improvement of UUIDv7. Because anything coming from the user will need to look up a UUIDv4, which means every new row needs to create an extra random UUIDv4 which gets inserted into a second B-tree index, which recreates the very performance problem UUIDv7 is supposedly solving.
In other words, you can only use UUIDv7 for rows that never need to be looked up by any data coming from the user. And maybe that exists sometimes for certain data in JOINs... but it seems like it might be more the exception than the rule, and you never know when an internal ID might need to become an external one in the future.
Whether creation date is PHI…I could see the argument being yes, since it correlates to medical information (when someone sought treatment, which could be when symptoms present.)
Always thought that was elegant (the attach not using the time as the seed).
UUIDv4 removes all three of those vectors. UUIDv7 still removes two of three. It doesn't leak record count or the rate at which you create them, only creation time. And you still can't guess adjacent keys. It's a pretty narrow information leakage for something you routinely reveal on purpose.
With UUIDv7 the creation time is always leaked without any sampling. A casual attacker could quite easily lookup the time and become motivated in probing and linking the account further
When sequential integer ID's are externalized, an attacker does not need creation times to perform predictive attacks. All they need to do is apply deltas to known identifiers.
There was previously an article linked here about recovering access to some bitcoin by feeding all possible timestamps in a date range to the password creation tool they used, and trying all of those passwords.
Knowing approximate age is a relatively small leak compared to that.
Bank security does not depend on your bank account being private information. Pretty much all bank security rounds to the bank having a magic undo button, so they can undo any bad transactions after it comes to light that it was a bad transaction. Sure they do some filtering on the front-end now to eliminate the need to use the magic undo button, but that's just extra icing to keep the undo button's use to a dull roar.
>> So this basically defeats the entire performance improvement of UUIDv7. Because anything coming from the user will need to look up a UUIDv4, which means every new row needs to create an extra random UUIDv4 which gets inserted into a second B-tree index, which recreates the very performance problem UUIDv7 is supposedly solving.
> This is only really true if leaking the creation time of the record is itself a security concern.
No, as "leaking the creation time" is not a concern when API's return resources having properties representing creation/modification timestamps.
Where exposing predictable identifiers creates a security risk, such as exposing UUIDv7 or serial[0] types used as database primary keys, is it enables attackers to be able to synthesize identifiers which match arbitrary resources much quicker than when random identifiers are employed.
0 - https://www.postgresql.org/docs/current/datatype-numeric.htm...
If your security relies on attacker don't know your ID (you don't do proper data permission check), your security is flawed.
There is no need to put the privacy preserving ID in a database index when you can calculate the mapping on the fly
You put in 128 bits, you get out 128 bits. The encryption is strong, so the clients won't be able to infer anything from it, and your backend can still get all the advantages of sequential IDs.
You also can future-proof yourself by reserving a few bits from the UUID for the version number (using cycle-walking).
A UUIDv7 primary key seems to reduce / eliminate those problems.
If there is also an indexed UUIDv4 column for external id, I suspect it would not be used as often as the primary key index so would not cancel out the performance improvements of UUIDv7.
[1] https://www.cybertec-postgresql.com/en/unexpected-downsides-...
That doesn't matter because it's the creation of the index entry that matters, not how often it's used for lookup. The lookup cost is the same anyways.
Very true, as detailed by the link you kindly provided. Which is why a technique I have found useful is to have both an internal `id` PK `serial`[0] column (never externalized to other processes) and another column with a unique constraint having a UUIDv4 value, such as `external_id`, explicitly for providing identifiers to out-of-process collaborators.
0 - https://www.postgresql.org/docs/current/datatype-numeric.htm...
If your UUIDv4 is cached, your still suffering from extra storage and index. Not a issue on a million row system but imagine a billion, 10 billion.
And what if its not cached. Great, now your hitting the disk.
Computers do not suffering from lacking CPU performance, especially when you can deploy CPU instruction sets. Hell, you do not even need encryption. How about making a simple bit shift where you include a simple lookup identifier. Black box sure, and not great if leaked but you have other things to worry about if your actual shift pattern is leaked. Use extra byte or two for iding the pattern.
Obfuscating your IDs is easy. No need for full encryption.
I'm sure there might be a middle ground where most of the performance gains remain but the deanonymizing risk is greatly reduced.
Edit: encrypting the value in transit seems a simpler solution really
They're more performant than uuidv7. Why would I still use UIID? Perhaps I would still want uuids because they can be generated in client and because they make incorrect JOINs return no rows.
IMO, a major problem solved by UUIDs is the ability to create IDs on the client-side, hence, they are inherently user-facing. A major reason why this is an important use case for UUIDs is because it allows clients to avoid accidental duplication of records when an insertion fails due to network issues. It provides insertion idempotence.
For example, when the user clicks on a button on a form to insert a record into a database, the client can generate the UUID on the client-side, then attach it to a JSON object, then send the object to the server for insertion; in the meantime, if there is a network issue and it's unclear whether or not the record was inserted, the code can automatically retry (or user can manually retry) and there is no risk of duplication of data if you use the same UUID.
This is impossible to do with auto-incrementing IDs because those are generated by the database in a centralized way so the user cannot know the ID head of time and thus, if there is a network failure while submitting a form, the client cannot automatically know whether or not the record was successfully inserted; if they retry, they may create a duplicate record in the database. There is no way to make the operation idempotent without relying on some kind of fixed ID which has a uniqueness constraint on the database side.
The id would be exposed to users. An integer would expose the number of records in it.
Am I using right guys?
uuidv7 (-) and nanoid (_-) have special characters which urlencode to themselves.
none are small enough that you want someone reading them over the phone; but from a character legibility, ulid makes more sense.
Now someone should make a UUIDv7 -> ULID adapter lib that 1:1 translates UUIDv7 <-> ULID preserving all the timestamp resolution and randomness bits so we can use the db-level UUIDv7 support to store ULIDs.
In other words, "don't try this with CRDB".
And the amount of information it leaks is negligible - they might know the oldest and the newest and there’s an infinite gulf in between.
It’s better and more practical than SERIAL or BIGSERIAL in every way - if you need a random/external ID, add a second column. Done.
As others have stated, it completely defeats the performance purpose, if you need to lookup using another ID.
Postgres on the other hand doesn’t do clustered indexing on the PK… if I recall correctly.
On the other hand, if you're basically logging to your database so inserts are like 99% of the load, then it's something to consider.
For anyone interested:
CREATE FUNCTION uuidv7() RETURNS uuid AS $$ -- Get base random UUID and overlay timestamp select encode( set_bit( set_bit( overlay(uuid_send(gen_random_uuid()) placing substring(int8send((extract(epoch from clock_timestamp())*1000)::bigint) from 3) from 1 for 6), 52, 1), -- Set version bits to 0111 53, 1), 'hex')::uuid; $$ LANGUAGE sql volatile;
There are wild scenarios you can come up with where you may leak something, but that assumes the information isn't coming over anyway.
"Reveals account creation time" - most APIs return this in API responses by default.
When have you seen just a list of UUIDs and no other more revealing metadata?
Meanwhile what pwns 99% of companies? Phishing.
> While UUIDv7 still contains random data, relying on the primary key for security is considered a flawed approach
The correct way is 1. generate ID on server side, not client side 2. always validate data access permission of all IDs sent from client
Predictable ID is only unsafe if you don't validate data access permission of IDs sent from client. Also, UUIDv7 is much less predictable than auto-increment ID.
But I do agree that having create time in public-facing ID can leak analytical information.
Or are there any random id generators that can compromise, remain sequential-ish without leaking exact timestamps and global ordering?
morshu9001•8h ago
edoceo•8h ago
jrochkind1•8h ago
saagarjha•8h ago
morshu9001•8h ago
wongarsu•7h ago
morshu9001•7h ago
coolspot•8h ago
morshu9001•8h ago
tracker1•7h ago
morshu9001•6h ago
markstos•8h ago
bramhaag•7h ago
Of course you need to be sure the server will accept the ID, but that is practically guaranteed by the uniqueness property of UUIDs.
martinky24•8h ago
morshu9001•8h ago
rcfox•8h ago
martinky24•8h ago
rcfox•7h ago
With multiple servers talking to a single database, I'd still prefer to let the database handle generating IDs.
morshu9001•7h ago
Speaking of Google, Spanner recommends uuid4, and specifically not any uuid that includes a timestamp at the start like uuid7.
Deadron•8h ago
morshu9001•8h ago
Now, the index on the public IDs would be faster with a uuid7 than a uuid4, but you have a similar info leak risk that the article mentions.
rcfox•8h ago
morshu9001•7h ago
xienze•7h ago
nextaccountic•8h ago
bigserial must by generated by the db
coolspot•8h ago
tracker1•7h ago
crazygringo•5h ago
mhuffman•8h ago
So the common response is sequential ID crawling by bad actors. UUIDs are generally un-guessable and you can throw them into slop DBs like Mongo or storage like S3 as primary identifiers without worrying about permissions or having a clever interested party pwn your whole database. A common case of security through obscurity.
simongr3dal•8h ago
morshu9001•8h ago
tracker1•7h ago
e12e•7h ago
molf•7h ago
UUID7 allows anyone to know the time of creation, but not how many records have been created (approximately) in a particular time frame. It leaks data about the record itself, but not about other records.
ibejoeb•8h ago
tracker1•7h ago
ibejoeb•7h ago
molf•7h ago
- Serial keys leak information about the total number of records and the rate at which records are added. Users/attackers may be able to guess how many records you have in your system (counting the number of users/customers/invoices/etc). This is a subtle issue that needs consideration on a case by case basis. It can be harmless or disastrous depending on your application.
- Serial keys are required to be created by the database. UUIDs can be created anywhere (including your backend or frontend application), which can sometimes simplify logic.
- Because UUIDs can be generated anywhere, sharding is easier.
The obvious downside to UUIDs is that they are slightly slower than serial keys. UUIDv7 improves insert performance at the cost of leaking creation time.
I've found that the data leaked by serial keys is problematic often enough; whereas UUIDs (v4) are almost always fast enough. And migrating a table to UUIDv7 is relatively straightforward if needed.
MBCook•3h ago
World’s easiest hack. You’re looking at /customers/3836/bills? What happens if you change that to 4000? They’re a big company. I bet that exists.
Did they put proper security checks EVERYWHERE? Easy to test.
But if you’re at /customers/{big-long-hex-string}/bill the chances of you guessing another valid ID are basically zero.
Yeah it’s security through obscurity. But it’s really good obscurity.
morshu9001•3h ago
bruce511•2h ago
In some use cases it can be possible to exclude, or anonymize the PK, but in other cases a PK is necessary. Once you start building APIs to allow others to access your system, a UUIDv4 is the best ID.
There are some performance issues with very large tables though. If you have very large tables (think billions of rows) then UUIDv7 offers some performance benefits at a small security cost.
Personally I use v4 for almost all my tables because only a very small number of them will get large enough to matter. But YMMV.