I build a whole database around the idea of using the smallest plausible random identifiers, because that seems to be the only "golden disk" we have for universal communication, except for maybe some convergence property of latent spaces with large enough embodied foundation models.
It's weird that they are really under appreciated in the scientific data management and library science community, and many issues that require large organisations at the moment could just have been better identifiers.
To me the ship of Theseus question is about extrinsic (random / named) identifiers vs. intrinsic (hash / embedding) identifiers.
https://triblespace.github.io/triblespace-rs/deep-dive/ident...
https://triblespace.github.io/triblespace-rs/deep-dive/tribl...
One upside of the deterministic schemes is they include provenance/lineage. Can literally "trace up" the path the history back to the original ID giver.
Kinda has me curious about how much information is required to represent any arbitrary provenance tree/graph on a network of N-nodes/objects (entirely via the self-described ID)?
(thinking in the comment: I guess if worst case linear chain, and you assume that the information of the full provenance should be accessible by the id, that scales as O(N x id_size), so its quite bad. But, assuming "best case" (that any node is expected to be log(N) steps from root, depth of log(N)) feels like global_id_size = log(N) x local_id_size is roughly the optimal limit? so effectively the size of the global_id grows as log(N)^2? Would that mean: from the 399 bit number, with lineage, would be a lower limit for a global_id_size be like (400 bit)^2 ~= 20 kB (because of carrying the ordered-local-id provenance information, and not relative to local shared knowledge)
Each “post” has a CID, which is a cryptographic hash of the data. To “prove” ownership of the post, there’s a witness hash that is sent that can be proved all the way up the tree to the repo root hash, which is signed with the root key.
Neat way of having data say “here’s the data, and if you care to verify it, here’s an MST”.
(Gotta say here that I love HN. It's one of the very few places where a comment that geeky and pedantic can nonetheless be on point. :-)
From now until protons decay and matter does not exist anymore is only 10^56 nanoseconds.
10-20 bits: version/epoch
10-20 bits: cosmic region
40 bits: galaxy ID
40 bits: stellar/planetary address
64 bits: local timestamp
This avoids the potentially pathological long chain of provenance, and also encodes coordinates into it.
Every billion years or so it probably makes sense to re-partion.
Minor correction: Satellites don't go in every direction; they orbit. Probes or spaceships are more appropriate terms.
Timestamp + random seems like it could be a good tradeoff to reduce the ID sizes and still get reasonable characteristics, I'm surprised the article didn't explore there (but then again "timestamps" are a lot more nebulous at universal scale I suppose). Just spitballing here but I wonder if it would be worthwhile to reclaim ten bits of the Snowflake timestamp and use the low 32 bits for a random number. Four billion IDs for each second.
There's a Tom Scott video [2] that describes Youtube video IDs as 11-digit base-64 random numbers, but I don't see any official documentation about that. At the end he says how many IDs are available but I don't think he considers collisions via the birthday paradox.
Also, network routing requires objects that have multiple addresses.
Physics side of whole thing is funny too, afaik quantum particles require fungibility, i.e. by doxxing atoms you unavoidably change the behavior of the system.
adityaathalye•1h ago
Looks like this multispecies universe has centrally-agreed-upon path addressing system.
Octoth0rpe•1h ago
pavel_lishin•28m ago