I mean there’s not 16B people in the world, so a row per person can be ruled out pretty easily
In a hypothetical "master dump", a mix of all the dumps ever leaked, you'd expect dozens if not more entries for every "real person" out there. Think about how many people had a yahoo account, then how many had several yahoo accounts, and then multiply it with hundreds of leaks out there. I can see the number getting into billions easily, just because of how many accounts people have on many platforms that got hacked in the past ~20 years.
Sure, 99% of those won't be active accounts anymore, but the passwords used serve as a signal, at least for "what kinds of passwords do people use". There's lots to be learned about wordwordnumber wordnumbernumber, and so on.
i had a plan to do statistical studies of some password dumps to try and make a "compressed password list" that could generate password guesses on the fly, and i forgot why i didn't do it, but i'm sure it's because the "model" - the statistical dataset upon which the program would generate output, wouldn't really be that much smaller; at least not with my poor maths skills.
I'm assuming that someone who really knew what they were doing could get close to 20% - 15% of the full password list. I doubt i could do better than just compressing the dataset and extracting it on the fly.
The meta in that field is to extract "rules" (i.e. for hashcat) from datasets. Then you run the rules over the encrypted dumps. Rules can be word^number^number, word^number^word^number, or letter^upper^number^lower... etc. Then you build a dictionary and dict + rules = passwords.
Pretty sure you can extract some nice signals nowadays with embeddings and what not.
There are ~335 million Americans. Assume for simplicity that each of them owns one phone, and hence one SIM card. Generously assume that each SIM card has 1kb of authentication material. A data breach of all US consumer SIM keys would hence be ~335 million records and ~335 gb.
Such a breach would be far, far more catastrophic than anything we have ever seen (and probably anything we will ever see) in computer security, despite being half the size of this one, and containing less than 10% as many records.
> Everything (and I mean it) from that news report went through yours truly.
> Bob is a quality researcher
> The headlines implying this was a massive breach are misleading
But the headlines implying it are literally in the cybernews article, which is the source of it all? Why does the article talks about "the mass media" throughout the length of it, if it's the original source that was misleading?
charcircuit•5mo ago
mananaysiempre•5mo ago
[1] https://haveibeenpwned.com/passwords
[2] https://github.com/HaveIBeenPwned/PwnedPasswordsDownloader
charcircuit•5mo ago
Data breaches can also contain other things than just passwords. Things like phone numbers, addresses, etc that would also be useful for checking.
anon7000•5mo ago
charcircuit•5mo ago