Guid Smash

https://www.guidsmash.com

180•nugzbunny•5mo ago

Comments

nesk_•5mo ago

Nice experiment. Is the code available somewhere?

8organicbits•5mo ago

Note that this only considers UUIDv4, the random UUID. Other forms can generate UUIDs that are much closer together. For UUIDv7, UUIDs generated within the same millisecond will have identical 48 bit prefixes (or up to 60 when the monotonic counter from section 6.2 is used).

https://www.rfc-editor.org/rfc/rfc9562.html#monotonicity_cou...

e1g•5mo ago

You need to be generating >100M of them within the same millisecond before even remembering that collisions can theoretically happen.

charcircuit•5mo ago

>You

The entire universe. Else it's not universally unique.

8organicbits•5mo ago

I like UUIDv7s as database IDs since they sort chronologically, are unique, and are efficient to generate. My system chooses the UUIDs; I don't allow externally generated IDs in. If I did, then an attacker could easily force a collision. As such, I only care about how fast I create IDs. This is a common pattern.

If your system does need to worry about UUIDv7s generated by the rest of the universe, you likely also need to worry about maliciously created IDs, software bugs, clocks that reset to unix epoch, etc. I worry about those more than a bonefide collision.

tonyhart7•5mo ago

Your app is must be popular to be having an entire universe "amount" of users lol

joke aside all of this is theorical, in practical application its literally impossible to hit it that it doesn't matters if its possible or not since you are not google scale anyway

charcircuit•5mo ago

It's not just your app. It's any other app or data provider that you may now or in the future interact with.

tgv•5mo ago

Only if the other side uses your key as theirs, and uses it to store data from many sources. I, personally, don't feel it's hardly worth considering. A primary key under your own control doesn't cost much, and is a better choice.

Xelynega•5mo ago

That's not how namespacing works though, is it?

Getting UUID 'A' from app 'X' is easily distinguishable from UUID 'A' from app 'Y'.

charcircuit•5mo ago

The point of the first U in UUID, universal, is that you don't need to use namespacing.

tonyhart7•5mo ago

Universal mean unique that uid wouldn't be used anyone else in any point in history or just universal available in one app????

because you just overreach at this point, if you can develop a better one. be my guest

senko•5mo ago

Obviously, just the part within our light cone.

sgentle•5mo ago

Apparently there's 500 hours of video uploaded to YouTube every minute (30 seconds every millisecond). Assuming 4K@60fps, that works out to 14,929,920,000 pixels per millisecond.

If YouTube wanted to give every incoming pixel its own UUIDv7, they'd see a collision rate just under 0.6%.

avar•5mo ago

    > Assuming 4K@60fps [...] they'd see a collision rate just under 0.6%

This doesn't detract from your point of collisions like that being viable at that scale, but assuming an average of 4K@60fps is assuming a lot. The average video upload there is probably south of 1080p@30fps.

Xelynega•5mo ago

You're glossing over the fact that they assumed youtube would want to assign a UUID to each pixel in a 4k@60fps video as the use case that this would fail for...

e1g•5mo ago

Excellent example. And at that scale, you are generating 100TB/s in UUIDs so if you need to store them, you have much bigger problems than collisions.

amingilani•5mo ago

Instead of picking a target UUID and evaluating new UUIDs against it, a better experiment would be finding duplicates in all the UUIDs you have generated.

This plays nicely with the birthday paradox.

whyever•5mo ago

It would require a lot more memory, because you have to remember every generated UUID. And how would you do the partial match? You are not going to observe any collisions.

twiss•5mo ago

> The chances of generating two GUIDs that are the same is astronomically small.

> The odds are 1 in 2^122 — that’s approximately 1 in 5,000,000,000,000,000,000,000,000,000,000,000,00.

This is true if you only generate two GUIDs, but if you generate very many GUIDs, the chance of generating two identical ones between any of them increases. E.g. if you generate 2^61 GUIDs, you have about a 1 in 2 chance of a collision, due to the birthday paradox.

2^61 is still a very large number of course, but much more feasible to reach than 2^122 when doing a collision attack. This is the reason that cryptographic hashes are typically 256 bits or more (to make the cost of collision attacks >= 2^128).

Retr0id•5mo ago

2^61 isn't even that large, well within the compute budget of mere mortals.

PaulHoule•5mo ago

I think you might have trouble if you tried to assign one to every iron atom in an iron filing.

vlovich123•5mo ago

Depends on what “isn’t even that large means”. A modern 6ghz machine would probably need 12 years of 24/7 operation to count that high. To me that seems like a lot.

dgrin91•5mo ago

Yeah, but a nation state server farm can probably cut that down to minutes because their budget can buy a lot of processors. You only need a few hundred to really shrink it down to manageable numbers. And it turns out that nation starts aren't the only ones that have this budget

8organicbits•5mo ago

What's the threat here?

It's trivial to force a collision. Here's the same UUID twice:

6e197264-d14b-44df-af98-39aac5681791

Typically, you don't care about UUIDs that aren't in your system and you generate those yourself to avoid maliciously generated collisions. Your system can't handle 2^61 IDs. It doesn't have the processing power, storage, or bandwidth for that to happen. Not to mention traditional rate limiting.

thfuran•5mo ago

The last several comments were responding to

>2^61 is still a very large number of course, but much more feasible to reach than 2^122 when doing a collision attack. This is the reason that cryptographic hashes are typically 256 bits or more (to make the cost of collision attacks >= 2^128).

8organicbits•5mo ago

I'm not sure. 6ghz is around 2^61 CPU cycles in 12 years. I.E. basic CPU instructions; counting, not computing a cryptographic hash. Otherwise, where is the cluster that's bruteforcing ~122 bit cryptographic hash collisions in minutes?

vlovich123•5mo ago

For what it’s worth generating a random UUID for the purposes of collision isn’t generally much more complicated than a few arithmetic instructions which is why I used counting as an example. And as the other poster mentioned generating a UUID collision isn’t a security problem since the UUID tends to be generated within your infrastructure where you can’t really go full blast at generating UUIDs for all sorts of reasons anyway.

For cryptographic applications it is really small because the previous poster is correct that 2^64 is very small for that purpose - a small supercomputing cluster or two could decrypt such a cipher in a reasonable amount of time, which is why symmetric keys are all 256 bits and up to guarantee there’s no way to attack them.

8organicbits•5mo ago

I don't think that's quite right. A 128-bit UUIDv4 having a 50% chance of having any collision after 2^61 generations is very different from finding a specific 128-bit symmetric key. The best cryptanalysis of AES-128 is 2^126; nowhere near 2^64. Which is why standards bodies like NIST still recommend AES-128 as a baseline.

twiss•5mo ago

You're right that AES-128 is fine. Normally the birthday paradox only applies to cryptographic hashes.

The only way it would apply to symmetric keys is if you have a server that stores 2^64 encrypted messages, and can somehow find out which messages used the same symmetric key (normally not possible unless they also have the same IV and plaintext), and can somehow coerce the user who uploaded message #1 to decrypt message #2 for you (or vice versa). Obviously that isn't realistic.

Retr0id•5mo ago

That's assuming 1 IPC, and no parallelism. A desktop-class zen5 CPU has 32 threads, with AVX512. Pipelining gets you up to 2048 bits of SIMD throughput per core per clock cycle: https://www.numberworld.org/blogs/2024_8_7_zen5_avx512_teard...

So assuming you use 64-bit counters, you can divide those 12 years by 1024 to get 4 days.

And that's not even considering what you could do on a GPU.

Edit: I might be off by a factor of 2, not sure if the SIMD throughput is per-core or per-thread. Also thermal throttling. Same ballpark though!

vlovich123•5mo ago

Well UUID generation isn’t going to be quite as SIMDable as counting so the analogy breaks down there partially because of that. And += 1 isn’t a very SIMDable operation? Unless I guess you create a mask of +1, +2, +3, +4 and add that to your base number to generate those offsets (which only works with avx512 - avx2 can only do 2 increments since these are 64bit integers)

Then your 32 HT threads aren’t really going to give you full access to the underlying SIMD registers which are going to be per core which is where I assume you realized the 2x difference might show up?

And to do += 1 multithreaded you have to partition the range or you won’t get any speed up - if you don’t amortize the cost of atomic synchronization across threads you’re going to be going slower than a non-SIMD increment.

mmoskal•5mo ago

Counting to 2^61 probably is.

To actually find a collision in 128b cryptographic hash function it would take closer to 2^65 hashes. Back of the envelope calculations suggest that with Pollard's rho it would cost a few million dollars of CPU time at Hetzner's super-low prices. Not nearly mere mortals budget, but not that far off I guess.

Retr0id•5mo ago

A GUID is not a cryptographic hash function.

In any case, in 2023 I back-of-the-envelope estimated that you could compute 2^64 SHA256 for ~$100K, using rented GPU capacity https://www.da.vidbuchanan.co.uk/blog/colliding-secure-hashe...

8organicbits•5mo ago

That's great analysis. As you call out in the post, the 2^64 value is used to attack SHA256-128 (SHA256 truncated to 128 bits). NIST recommends at least SHA-224, which makes sense given your conclusions.

NoahZuniga•5mo ago

* not the birthday paradox, but the birthday bound.

habibur•5mo ago

The birthday paradox simplified : if you generate n bits of random data, you can at most generate n/2 bits of random numbers before clashes start to occur. That's square root of number's range.

So if you need 1000 random numbers, generate from 1 to 1 million.

arcanemachiner•5mo ago

I always assumed that intuitively... I think the number is 20 people for the birthday paradox. 20 x 20 = 400, and there are ~365 days in a year. Is that how that works?

twiss•5mo ago

The actual number is 23: https://en.wikipedia.org/wiki/Birthday_problem

The square root approximation works well for large numbers, but leaves out some factors that are relevant for small numbers.

Spooky23•5mo ago

I was always surprised the math maths for birthdays. Human birthdays are not random, and cluster around various dates and seasonal patterns.

Vvector•5mo ago

Here is a statistical analysis of birthdays https://www.zippia.com/advice/most-least-common-birthdays/

whyever•5mo ago

Doesn't the clustering make collisions strictly more likely?

vbezhenar•5mo ago

> So if you need 1000 random numbers, generate from 1 to 1 million.

If you don't check for clashes, the 50% chance of failure is too much. Probably even 0.1% is too much, so you'd need more elaborate approach.

If you do check for clashes, you can generate from 1 to 2000 with little overhead.

whyever•5mo ago

You can also look at the expected number of collisions instead, which is approximately the number of random numbers squared, divided by the size of the space of random numbers.

Then you can choose how many collisions to accept on average. (If the answer is zero, then it makes more sense to look at the probability of one or more collisions.)

Xelbair•5mo ago

i have seen in my life two guid collisions already. and i'm not that old.

One of them was genuine - generated by different systems, and it was caught when loading data from one to another - object had same ID, but different underlying type.

Other one was due to 'error' - two systems(by different companies, supporting the same data exchange standard) used magic hardcoded guid that turned out to be the same.

Both of those systems have full audit trail - each change created new row in database and IDs were formatted as {NAMESPACE}.{GUID}.{TIMESTAMP}. Mutation of an object created new entry with different {TIMESTAMP} part. Namescapes are mandated by standard, so different systems can have the same namespace value.

_alternator_•5mo ago

There are either bugs in the system or the GUID isn’t random. The first case you mention is probably both TBH; the second case is probably due to non-randomness (generating via namespace/timestamp leads to collisions when two objects are generated simultaneously).

ivanstepanovftw•5mo ago

Sorry, when I was young I did not know what these `public static final UUID` mean, so I copied them.

phyzome•5mo ago

Both vendors probably copied that GUID from the same place.

somat•5mo ago

2^61 guids is... 36 exabytes, if my napkin math is correct. when storing them in binary format(16 bytes each) if doing the javascript thing and storing them as strings... (shudders) I don't even want to think about it.

Anyhow that was my first thought when you mentioned 2^61 guids, where are you even going to put them? second thought, I don't think enumerating 2^61 guids is trivial, in fact, I suspect it would take longer than anyone would be willing to spend, and if you are not storing them why are you generating them?

And what even is a guid collision attack? it is not like they are a hash, and since they tend to be public identifiers it turns out despite their stated use to prevent collisions, you can't really use guids generated by others(if they wanted collisions they would straight up just copy yours) so you end up regenerating them anyway.

webstrand•5mo ago

This is the chance that given a specific guid, that you'll find a collision for it. Utterly minuscule chance. However birthday paradox controls, if you generate 2^62.60 guids the chance that you've generated a collision is around 99%. Still enormously unlikely, but way smaller than 2^122.

At a rate of comparing 400,000 guids per second, you have a 99% chance of seeing a collision within the next 553,750 years.

jonathrg•5mo ago

You would need a little more memory to see/detect that collision.

RS-232•5mo ago

UUID > GUID.

Microsoft’s GUID standard is garbage.

lionkor•5mo ago

Oh, why?

w-ll•5mo ago

not OP but i already have fields for time ts and what model it is. i want my uuids random.

kaoD•5mo ago

I think the current Microsoft GUID is just UUIDv7.

https://learn.microsoft.com/en-us/dotnet/api/system.guid?vie...

I don't think there's a "Microsoft standard" and they just use different versions of UUID in different products over time. No idea why they call it GUID instead of UUID though, but it's easier to speak out loud so I'm not against it.

v7 has a timestamp indeed, but isn't the time making it more collision resistant? You'd have to generate tons of UUIDv7s in the same millisecond, while v4 is more likely to collide due to not being time-constrained and the birthday paradox.

I think both have their uses though. You might need pure random if you want your UUID not to convey any time information and you're not generating tons of them (e.g. a random user id).

What do you mean "model"? Are you referring to UUIDv1 which has time and MAC address?

Zambyte•5mo ago

> isn't the time making it more collision resistant?

That seems to depend a whole lot on the pattern your application generates UUIDs in. If you're generating a consistent distribution over time, sure. If you generate a whole lot in bursts, collision seems to be way more likely.

kaoD•5mo ago

You have to generate 2^37 (137,438,953,472) UUIDv7s in the exact same millisecond to have a 50% chance of collision.

(Not disagreeing with you, just adding perspective.)

8organicbits•5mo ago

The math is interesting here as you'll probably want to run your system for several years, not just a single millisecond. So it's a repeated trials problem. I spent some time trying to figure out the ID generation rate that would be a "break even point" between UUIDv4 vs UUIDv7, but I didn't trust the answer I got.

(Agreeing with both parents)

kaoD•5mo ago

Good observation. Could you share the math even if you don't trust it? I don't have pen and paper here and I'm curious.

After thinking it more, I have the feeling (against my initial intuition) that v4 might dominate either way unless you consistently generate tons of UUIDs for an impractical number of years.

Zambyte•5mo ago

I ran some numbers by GPT-5[0], and for the scenario of generating 10k UUIDs in one ms every 10ms, over three years, it came up with a 0.0025% chance of collision for UUIDv7, and a 0.000000084% chance for collision with UUIDv4.

[0] https://kagi.com/assistant/dd7d8c48-44e4-499b-9f2f-33663d125...

8organicbits•5mo ago

I checked against my notes, I see about the same numbers using the `n**2` taylor series approximation. I missed that the probability of `>=1` collision is about the same as exactly one collision, but I suspect that's quite reasonable as this scale.

JdeBP•5mo ago

Years ago, Microsoft took the same algorithm that was being used to generate these things for Remote Procedure Calls in the Open Software Foundation's Distributed Computing Environment and used that algorithm to generate IDs for its Component Object Model. This was all happening in the late 1980s, and at a point where none of it was hard and fast.

If you were doing RPC in OSF DCE your IDs were UUIDs, and if you were doing COM in Wintel your IDs were GUIDs; and that was basically the difference, a different name for the same thing when used in a different domain.

Plus the difference in endianism because one was a network-byte-order network thing and the other was an Intel Architecture byte order thing, and only some parts of these IDs were technically multiple-byte integers with byte orders to have.

But by the late 1990s this had already become lost to history, with a sea of people who had made all sorts of inferences and promoted them as gospel truth, from the fact that Microsoft had two programs named GUIDGEN.EXE and UUIDGEN.EXE, from the fact that many generators sprang up and the whole idea spread to Java and databases and this new-fangled WorldWeb thing and all sorts of stuff, from the fact that there sprang up multiple different versions of these IDs and what version an ID was depended from tooling and libraries, and from the fact that at the time Microsoft was less likely to go through formal standards processes and more likely to just write and ship things and sponsor a book and a CD-ROM of doco so if your world was RFCs and the IETF you had one worldview and if your world was Microsoft Press and the MSDN you had another worldview.

lyu07282•5mo ago

> i want my uuids random

You probably actually want at least a few prefixed bytes be a timestamp (uuidv7) for b-tree efficiency but ymmv

nopassrecover•5mo ago

Reminds me of a problem I ran into once where someone had wanted unique but short codes as identifiers for relatively small counts, and picked a substring of a UUID:

http://mattmitchell.com.au/birthday-problems-friendly-identi...

kr2•5mo ago

> However, the overall takeaway was: Don’t use the MongoDB Increment value as a Unique Identifier.

However, the overall takeaway should be, as always: don't use MongoDB. Period. Every time I learn something new about it I'm baffled about why people continue to use it.

Joel_Mckay•5mo ago

Most just pack down:

epoch time + MAC Address + transaction counter (catch NTP skew) + Thread PID + new Pointer address = GUID

Then increment global transaction counter, complete some ops, and check to ensure current epoch time is in the future before the transaction frees the memory locations.

This is often robust in highly concurrent distributed systems even under network degradation, or corrupted sync states. Has other interesting use-cases too. =3

ahmedfromtunis•5mo ago

The proximity measure seems to be flawed.

If you want to see how close to a non-ordinal 123456 a random generator can get, you also need to look for stuff like 923456 or 123956, etc.

Also, would 223456 be considered a closer match compared to 323456? (It shouldn't in my opinion because, again, these are non-ordinal strings).

gammalost•5mo ago

If its a random ID then I'd argue that all of them are equally close to each other. With that said, I do not know how GUIDs are generated

franky47•5mo ago

Easy, it should be listed there: https://everyuuid.com/

867-5309•5mo ago

please may all the death huggers go hug a tree. thanks

ivanjermakov•5mo ago

Reminds me of SHAllenge: https://news.ycombinator.com/item?id=40683564

Show HN: I built a free UCP checker – see if AI agents can find your store

Show HN: SVGV – A Real-Time Vector Video Format for Budget Hardware

Study of 150 developers shows AI generated code no harder to maintain long term

Spotify now requires premium accounts for developer mode API access

When Albert Einstein Moved to Princeton

Agents.md as a Dark Signal

System time, clocks, and their syncing in macOS

McCLIM and 7GUIs – Part 1: The Counter

So whats the next word, then? Almost-no-math intro to transformer models

Ed Zitron: The Hater's Guide to Microsoft

UK infants ill after drinking contaminated baby formula of Nestle and Danone

Show HN: Android-based audio player for seniors – Homer Audio Player

Starter Template for Ory Kratos

LLMs are powerful, but enterprises are deterministic by nature

Make your iPad 3 a touchscreen for your computer

Internationalization and Localization in the Age of Agents

Building a Custom Clawdbot Workflow to Automate Website Creation

Why the "Taiwan Dome" won't survive a Chinese attack

Xkcd: Game AIs

Windows 11 is finally killing off legacy printer drivers in 2026

From Offloading to Engagement (Study on Generative AI)

AI for People

Rome is studded with cannon balls (2022)

8-piece tablebase development on Lichess (op1 partial)

US to bankroll far-right think tanks in Europe against digital laws

Ask HN: Have AI companies replaced their own SaaS usage with agents?

pi-nes

Show HN: Crew – Multi-agent orchestration tool for AI-assisted development

New hire fixed a problem so fast, their boss left to become a yoga instructor

Four horsemen of the AI-pocalypse line up capex bigger than Israel's GDP

Show HN: I built a free UCP checker – see if AI agents can find your store

Show HN: SVGV – A Real-Time Vector Video Format for Budget Hardware

Study of 150 developers shows AI generated code no harder to maintain long term

Spotify now requires premium accounts for developer mode API access

When Albert Einstein Moved to Princeton

Agents.md as a Dark Signal

System time, clocks, and their syncing in macOS

McCLIM and 7GUIs – Part 1: The Counter

So whats the next word, then? Almost-no-math intro to transformer models

Ed Zitron: The Hater's Guide to Microsoft

UK infants ill after drinking contaminated baby formula of Nestle and Danone

Show HN: Android-based audio player for seniors – Homer Audio Player

Starter Template for Ory Kratos

LLMs are powerful, but enterprises are deterministic by nature

Make your iPad 3 a touchscreen for your computer

Internationalization and Localization in the Age of Agents

Building a Custom Clawdbot Workflow to Automate Website Creation

Why the "Taiwan Dome" won't survive a Chinese attack

Xkcd: Game AIs

Windows 11 is finally killing off legacy printer drivers in 2026

From Offloading to Engagement (Study on Generative AI)

AI for People

Rome is studded with cannon balls (2022)

8-piece tablebase development on Lichess (op1 partial)

US to bankroll far-right think tanks in Europe against digital laws

Ask HN: Have AI companies replaced their own SaaS usage with agents?

pi-nes

Show HN: Crew – Multi-agent orchestration tool for AI-assisted development

New hire fixed a problem so fast, their boss left to become a yoga instructor

Four horsemen of the AI-pocalypse line up capex bigger than Israel's GDP

Guid Smash

Comments