But hey, it makes Silicon Valley money.
I've been touting this as a business model for years. Better still, I'd like to see it done with behavioural models (in the open). That would really blow the lid off the industry. Imagine people charging companies, instead of simply being the product...
Here's some research aided by Perplexity, which estimates that the global data market is valued at about $1.7 Trillion, with data monetization growing at about 17.6% CAGR:
https://www.perplexity.ai/search/today-i-would-like-to-try-a... (138 sources)
Also, Meta can identify you based on your movement and a few pieces of social data (all of which is in the open).
Tel Aviv airport has been running behavioural monitoring for about a decade, predicting crimes before they happen.
You mention a case from 2021, which is about $5 trillion ago, and think that the government selling data is surprising. This is mature market that already knows everything about everyone, especially in the US, and is more concerned with what to do with it. The faucet is open, the ground floor is flooded, and we're discussing the different types of fish that have moved into our apartment.
I'd happily run it as a non-profit with the purpose of highlighting the value of people's data. Tough gig though, when there are all these "off switch" guys around.
But here, the controller of the data is the airline, the transfer to the data broker might be illegal, and an airline is the worst company to commit GDPR violations with: They have a lot of global revenue but a relatively thin margin, very little of that margin comes from data abuse (so they can't just shrug off the GDPR fine as a small cost of doing shady business), and they are reachable in the EU (worst case a member state can ground and confiscate their planes, and essentially ban them from flying to the EU by threatening to confiscate any other plane that lands). And yes, Germany will impound a plane to get debts paid: https://www.reuters.com/article/world/thai-prince-to-pay-bon...
The barcode in the boarding pass contains all the information that airlines know about you [1]. It is after all only encoded and not encrypted and so many companies manufacture readers for it.
Airports check-in systems, or it could be from the baggage handling system , the duty free shop or the airport lounge and so on.
There are so many different players who have access to most or all of the data it would hard to prove it came any one source at all.
That is just the barcodes on the boarding pass, passport scanners are like couple of hundred dollars ans airport shops/car rentals use them all the time.
Many airports use facial scanning these days and don’t even ask for boarding pass/passport/visa during boarding at all .
There are auxiliary sources which could be used in conjunction with other sources like Uber booking and so on.
[1] https://krebsonsecurity.com/2015/10/whats-in-a-boarding-pass...
> There are so many different players who have access to most or all of the data it would hard to prove it came any one source at all.
Because a prosecutor can obtain copies of all emails talking about this, they can examine your bank accounts for payments from data brokers, they can require legal to give them copies of any contracts, they can look at audit logs from the production database and airlines aren't Evil Inc -- stuff will inevitably leak and get out. You can't cover yourself that well as a CEO looking to make a quick buck...
>>"Movement unrestricted by governments is a hallmark of a free society. "
The other half of the lede is that this govt is using Insert_Method of restricting the movements of it's residents.
At this point, any persecuted activity, e.g., obtaining reproductive healthcare with a link to a person in a Red State, requires opsec procedures comparable to a CIA dark op just to not get persecuted.
It's kinda like how the police need warrants to request cellphone data, but cellphone companies could sell realtime data to third parties who in turn sold it to the police.
It is not even certain that the data actually comes from the TSA. It could come from airlines, payment companies, etc.
There is no guarantee of quality when purchasing data from a broker.
They're spending public money, so the cost doesn't matter to them either. With this administration they can get unlimited funding.
Flight purchases would be critical and distinct information for law enforcement.
But you're proposing something even more outlandish, asking another agency for data. The politics of this are mind bending. If one one agency give their data to another and that agency is successful using it it will make the giving agency look bad which is unacceptable. It was wild how many times another, supposedly friendly agency, would not share data. In fact, I was cautioned not to even bring up the idea in shared meetings because it would create unnecessary friction.
If you buy it from a 3rd party government contractor, none of this has to happen.
You'll get a response from their legal counsel requesting some information for them to verify your request.
In 2012 I created a killer prototype that demonstrated that you could accurately reconstruct most people's flight history at scale from social media and/or ad data. Probably the first of its kind. This has been possible for a long time.
A quick sketch of how it worked:
We filtered out all spatiotemporal edges in the entity graph with an implied speed of <300 kilometers per hour or <200 kilometers distance, IIRC. This was the proxy for "was on a plane". It also implicitly provided the origin and destination.
These edges can be correlated with both public flight data and maintenance IoT data from jet engines to put entities on a specific flight. People overlook the extent to which innocuous industrial IoT data can be used as a proxy for relationships in unrelated domains.
In rare cases, there was more than one plausible commercial flight. Because we had their flight history, we assumed in these cases that it was the primary airline they had used in the past, either generally or for that specific origin and destination. This almost always resolved perfectly.
This was impressively effective and it didn't require first-party data from airlines or particularly sophisticated analytics. Space and time are the primary keys of reality.
- https://www.blakefire-security.co.uk/blog/social-media-and-j...
>- https://www.blakefire-security.co.uk/blog/social-media-and-j...
FYI the source you posted never claimed that John Terry's insurance tried to deny the claim, only mentioning that "some" insurance companies warn of it. However even that claim is questionable, because it isn't even from an insurance company, it's from a content marketing piece by an insurance comparison website.
However, it turns out that thousands of people like to talk about their flights on social media, so we scraped that as a spot check and it mostly lined up perfectly. Good enough for a demo and it would have been difficult to come up with an alternative explanation for the patterns in the data.
The purpose of the PoC was to sell the data analysis infrastructure that made that type analysis possible at scale, it wasn't about the data per se. It was a compelling demo we invented given the data that happened to be available. Startup life.
For fun edge cases, there's always Antarctica, where you can travel from a US base (which looks like you're in the US) to a NZ base (which looks like you're in NZ) in a couple of minutes: https://brr.fyi/posts/credit-card-shenanigans
> you have a record of a lot of location/timestamp data for people
What is the source of that data?
So if you have ad impression data you have IP geolocation, or maybe better, along with the timestamp. Similarly for socials sometimes you get location metadata, and with image uploads you can can get location metadata (though today these are often stripped, historically they weren't).
Especially since the claim was in 2012, is airplane wifi and roaming data reliable enough for people to view enough ads to do this?
Also where are you getting all a users ad impressions across different providers to have this kind of timing information?
There's a lot of creepy data available linked to ads and lots of companies doing enrichment with different fields if you have already have a real identifier for a user (e.g. phone number, email, cc, isp account number), but this sounds way more like "we created a load of hypothetical simulated data and pretended we had access to all of it"
Sounds like the bigger issue is that you're able to get "spatiotemporal" data in the first place? Otherwise it's like saying "we can figure out all stores you've been to, if we have your credit card transaction history". Sure, it's kinda creepy that you can figure out which stores I went to, but the bigger problem is that you can get the transaction data in the first place. Moreover whatever "spatiotemporal" data needed to reconstruct such flight history is probably more valuable than the flight history itself. Who cares if you know Joe flew on United 8340 when you have hour-by-hour updates on his rough location?
Yeah, this just sounds like it's written from the perspective of a data broker.
Tying particular ad analytics (presumably ip geolocation?) to thousands of particular individuals and having it well populated enough to track them is "privileged first-party data access" by another name.
Okay, fine, I'll just install another operating system then, like KDE plasma mobile or GrapheneOS. Your location is still leaked 24/7. This is because your cellular modem has it's own operating system, running underneath your phone's operating system, which is triangulating your location at all times. Once again, you are trusting that telecommunications companies aren't misusing this - but please remember they're complied, by law, to make a lot of this information available to numerous third parties.
Okay fine, let me just remove the Sim then and use my phone on Wifi only, always through a VPN. Your location is still being leaked potentially, for example, by your car. Your car also has a cellular modem which leaks your location, and you probably signed a contract allowing that data to be given to hundreds of third-parties.
Of course, all of this is assuming you don't use any social media. Social media can also leak your location, even without location services. If you review a restaurant - that's your location. Where are your friends? You're probably around them. And on and on.
It's also disappointing that the root comment is distracting from the 4th amendment violations by making the conversation about their vague claims of selling mini-palantir demos through abusing web ads.
Any data exhaust will work, people have created interesting PoCs leveraging things like HVAC data, RF attenuation, etc. High-precision weather models essentially work the same way, making inferences by stitching together diverse event data that has nothing to do with weather.
High-quality high-resolution data sources largely don't exist in the way people imagine they do, so you need to do this anyway. If you have a high-resolution spatiotemporal graph for entities, tying it to identity is always trivial.
It would be more common if it weren't for the fact that open source platforms scale poorly for this type of analytical processing.
source?
>You are trusting that this data is not leaked to any third-parties. You cannot verify this, as the data is exfiltrated to servers which you can't verify.
At least on Android you can theoretically disable "google location accuracy" which stops it sending nearby hotspot mac addresses to Google. That's the only public route where google gets your location without you knowingly sending to it. You also imply that mobile operating systems are surreptitiously sending locations back to google/apple even if users have all location related features disabled, but I'm not aware of any evidence this is the case, and this falls into same category as "facebook is secretly listening to you" territory until proven otherwise.
When I'm connected to a VPN that's far away, applications with my precise location, such as Google Maps, use my real-life location.
> You also imply that mobile operating systems are surreptitiously sending locations back to google/apple even if users have all location related features disabled
I did not, I said the operating system has access to your location, which it does. You're relying purely on trust in both Apple and Google to not send that location out. The toggle is just that, a software toggle in an app. The operating system is billions of lines of closed-source code.
> and this falls into same category as "facebook is secretly listening to you" territory until proven otherwise
Meaning, almost certainly true? If I gave Facebook microphone access, I would 100% expect it to be using my microphone for analytics. If you read their privacy policy, this is definitely allowed.
Personally, in my opinion, you'd have to be very naive to think tech isn't trying to maximize it's gains here. If you can get the data, and there's no consequences, and it's allowed... why wouldn't you?
that's because VPNs don't change your phone's geolocation, which is determined by GPS/wifi signals. It's not a "VPN bypass" in any meaningful sense, any more than amazon knowing where you live because you filled out your address isn't a "VPN bypass".
>Meaning, almost certainly true? If I gave Facebook microphone access, I would 100% expect it to be using my microphone for analytics. If you read their privacy policy, this is definitely allowed.
Both iOS and Android has microphone indicators, so the idea that it's surreptitiously listening to you behind your back is doubtful, or requires some sort of conspiracy between it and Google/Apple. That's not impossible, but should be considered a crank theory until proven otherwise.
>[...] and there's no consequences, and it's allowed... why wouldn't you?
Strongly disagree. Google has been sued and lost for much lesser privacy infringements, like tracking users while in incognito, which if you read the suit is pretty absurd. Evidence that any big tech company was intentionally eavesdropping on users (ie. excluding something like voice assistants being accidentally triggered) would be a bombshell.
But those suits dont matter:
- the penalty they have to pay is laughably small
- consumers by and large dont know or dont care enough to switch away from Chrome.
Ok, fine. I'll just drive classic cars for the rest of my life. Your location is still being leaked by a global network of automated license plate reading cameras https://deflock.me/
This is one of those common scaremongering things. It's an unavoidable part of the network, not extra malice added on top - it's basically equivalent to pointing out that in a wired network, it's possible to follow the wires to see where they go. And then adding that your ISP is spying on you by having a database of where all the wires go. Technically correct, but calling it something nefarious makes it more scary than useful information. It's just how networks work and you should be aware of it when using networks, especially wireless ones.
I also have a lot of experience with privileged first-party data but that is governed by a different set of rules and is often regulated. You have to be much more circumspect about how you use it.
Even though it might be convenient to e.g. slurp telemetry off a mobile carrier's backbone, what you eventually realize is the inability to do this isn't a real limitation and in some ways is a blessing in disguise.
The preposterous thing is that payment processors aren't just allowed to collect this information and tie it to your name, they're required to do that.
People talk a big game about fighting fascism, but how can you allow these laws to exist if you can contemplate what happens when actual fascists get hold of that data going back decades? They need to be dismantled now.
If you want to do it, get a warrant.
To use the US as an example (I doubt other countries are much better) it's estimated that every adult in the US commits multiple Federal felonies per day[1], Federal law is replete with ridiculous laws[2] and the number of federal laws is uncountable by Congressional Research Service staff. Does it matter at that point?
[1] Three Felonies A Day - ISBN 978-1594035227
That's not a serious estimate: https://news.ycombinator.com/item?id=43744267
Someone who smokes weed daily in a place where it's illegal could easily commit multiple crimes a day just for drug possession and consumption, for example.
There is no state where cannabis derivatives are federally legal.
[1]: https://decider.com/2022/01/04/is-it-federal-crime-to-share-...
> In 2016, the US 9th Circuit Court of Appeals ruled that sharing online passwords is a crime prosecutable under the Computer Fraud and Abuse Act.
https://en.wikipedia.org/wiki/United_States_v._Nosal
>A few months after leaving Korn/Ferry, Nosal solicited three Korn/Ferry employees to help him start a competing executive search business. Before leaving the company, the employees downloaded a large volume of "highly confidential and proprietary" data from Korn/Ferry's computers, including source lists, names, and contact information for executives.
Extending that ruling to netflix password sharing is a stretch.
Moreover you can't say "I can think of one activity that many americans do is a felony", and then apply induction on it to claim that the other activities americans due surely contain felonies.
>That's probably another 1/6 at least. Now it's 1/3 of the country.
That's only true if you assume the population of weed smoker and netfilx watchers don't intersect, which is... doubtful.
https://en.wikipedia.org/wiki/Aaron_Swartz
at any rate possession of Marijuana or other controlled substances does not mean that one uses them, so lots of people are theoretically in possession because they give someone a ride that has drugs with them.
To begin with, let's not ignore how broad a category "small business" is. Laws requiring health inspections or licenses etc. often operate on the basis of frequency or number of patrons. If you have around a dozen people over for movie night every Saturday with the event published on social media and you all chip in for pizza, are you a food service business? For that matter, is that a public performance in violation of copyright?
If some criminals break into one of your devices or your personal website while you're traveling and you find out about it while you're out of state but don't have time to deal with it until you get back home, have you committed a crime? What if they put some illegal materials there and you clean off the device but still have a backup containing the illegal materials? What if you do delete all of them right away; is that destruction of evidence? What if there's a federal law against keeping the materials and a state law against destruction of evidence and a very specific way to comply with both of them at the same time that may not have been clearly decided by the appellate court when it was happening but has been decided by the time they bring the case against you? What if it was clear ahead of time but wasn't intuitive and you can't afford a lawyer and can't have one appointed until after you've been charged?
It's unreasonable to expect ordinary people to be able to navigate this.
That's what courts are for. I don't think there's any case where people tried to prosecute a shared movie night as a business, because it'd be laughed out of court. Same goes for whether it's copyright infringement or not. Moreover if you look at how authoritarian regimes work in practice, dissents are often prosecuted under national security laws, campaign finance violations, or libel laws, not because they violated the health code by having a movie night.
But that exotic case is not that much needed. Laws will be abused by the powers whenever they want; you don’t need to look farther than the current USA administration and how the president is using war powers to treat poor laborers as enemy combatants and send them to concentration camps. And yet, USA’s system of government was designed in a way that should have prevented the executive to abuse power; why it has failed is another (difficult) discussion, but the founding fathers seemed well acquainted with the despotism of other nations.
[^1]: https://www.rtve.es/noticias/20090828/cuba-detiene-a-disiden...
That isn't really how courts work. If you're violating the letter of the law then you are breaking the law and an actual impartial judge would enforce it against you. In practice whether they let you get away with it is based in significant part on whether or not they like you. If the judge doesn't like the administration then maybe they do like you. But if the judge doesn't like you for the same reason the administration doesn't like you then you're going to jail. And it shouldn't have to depend on that; we shouldn't have laws that people are constantly in technical violation of so that the only thing keeping anyone out of jail is prosecutorial discretion and judicial affinity.
Meanwhile you can characterize anything in a negative light. A random home kitchen typically isn't going to meet the standards for commercial operation and the prosecutor's press release isn't going to say "we're prosecuting our enemies for movie night", it's going to say "defendants were operating a for-profit restaurant in violation of zoning rules and storing uncooked meat above fish in the freezer used for storing food sold for resale in violation of the health code" and then stick them with a fine that would make them lose their house.
> Moreover if you look at how authoritarian regimes work in practice, dissents are often prosecuted under national security laws, campaign finance violations, or libel laws, not because they violated the health code by having a movie night.
When the dictator of petrolistan wants to retaliate against their enemies and those laws are available for that, sure.
When the mayor of some US town wants to do the same thing, they might very well resort to health code violations that wouldn't have otherwise been enforced.
Deterrents well short of political executions are still very much official misconduct.
Exceeding the driving speed limit is more of an "infraction" and not a crime until it becomes reckless.
Yes, wherever it is criminal to improve the wellbeing or support progress of society, I support the ability of people to be criminals.
Rosa Parks wasn't allowed to sit at the front of the bus. Criminal.
I doubt MLK had a permit for every march. Criminal.
I doubt the founding fathers were legally allowed to oppose the British taxes. Criminals.
A society with no crime is a dystopia.
The law does not, by default, prosecute all crimes. There is no country in the world that has even close to the law enforcement capacity to investigate and prosecute all crimes. What tends to happen instead is crimes that to put it colliquially, "piss off the wrong people" get prosecuted. ie, crimes that draw attention of either the general public or specific people in power.
A reasonable approximation is single digit or less of crimes get investigated and prosecuted, with it obviously being high for violent and visible crimes like murder and lower for less violent and visible crimes like stealing the office paperclips.
Another way of looking at this is, in the current system, if your house get burgled, you need to report it to the police if you expect anything to happen, whereas one could imagine another system where the police already know your house has been burgled and you don't need to report it.
So they roll up on the guy and put him in the squad car and see if he’ll admit (or have the stuff on him) and when he doesn’t, release him hours or the next day.
I find this view to be lacking in nuance.
Laws are intended to exist with the consent of the governed. Substantially the whole of society agrees that murder should be illegal, so if someone commits murder we're willing to commit significant resources to investigating and prosecuting the perpetrator. It doesn't have to be efficient or have perfect enforcement because its purpose is to act as a deterrent. Everyone is willing to spend the resources to enforce those laws because everyone agrees that their enforcement is important. Enforcement efficiency is not required when there is popular consent.
Opposing laws that "help criminals" exposes society to shifts in the definition of a crime. When there is a law against being of a particular ethnicity or religion or political ideology, you want to enable people to be criminals. Preventing laws like that from ever being effective is worth sustaining a significant amount of inefficiency in the enforcement of other laws.
And this is not a binary distinction with "laws against murder" on one side and "laws against being Jewish" on the other. The latter is only the viscerally powerful extreme that once made us say never again.
The spectrum spans the full scale, where the middle is filled with police corruption and political retaliation against the opposition and petty busybodies inducing poverty and homelessness through the incompetent micromanagement of society.
Should governments have the ability to freeze the bank accounts of protesters? It doesn't matter what they're protesting or what crimes some minority of the protesters are alleged to have committed when the account freezes are instituted as collective punishment, the answer is no. The government should not have the ability to do that, because in that case they are the criminals, and structural defenses against government abuses are important.
This is not necessarily a good thing and laws can change without requiring them to be broken.
That's kind of the problem, right? Suppose you have a system that actually allows perfect enforcement and then the government passes a law against some religious practice. Espousing atheism is banned, or Islam, or Christianity, depending on who controls the government this time; take your pick. If anybody who does it is instantly brought up on charges with severe penalties then nobody does it. But that's bad. That's the problem. You need to sustain enough friction to prevent things like that from being possible because enforcing laws like that is worse than anything that could come out of making ordinary law enforcement require more resources.
I don't think it's bad. Similar to closed and open source software there is room for closed and open societies. They are different approaches that have different pros and cons.
Sure, but of course it isn't black and white.
>In which case we shouldn't have any such thing in the open countries.
I still think being able to effectively apply the will of We the People would be good to do. Being afraid that the people will be able to want for something you don't like to happen is disrespectful to the will of the people.
Right, I mean, wrong.
before you go trying to do the HN thing of explaining why your open/closed source analogy was super brilliant, and my response was reductive and clueless, please consider that societies are composed of living people and software is composed of 1s and 0s.
If they had access to better data they wouldn't have to do it arbitrarily and could more accurately target problematic people. I believe there isn't room for doing it arbitrarily as from my perspective it is a suboptimal strategy towards accomplishing one's goals.
>please consider that societies are composed of living people and software is composed of 1s and 0s.
I am not interested in making an emotional argument. I would be willing to hold my belief regardless of what a system is made up of.
Or from a different direction: if your method of analysis leads you to believe a HN contributor “want to enable people to be criminals” based on their policy preference about KYC, you’re not taking the discussion seriously.
Which is why people who want to be able to always do activity X regardless of the law will make it infeasable to enforce a ban on X and will try and oppose anything that would make it easier.
The only reason we don't have this already is that the law makes it so hard to start a competing payments network -- in no small part as a result of KYC requirements -- that the incumbents are insulated from real competition and then don't have to fix the flaws in their systems.
Meanwhile you don't actually need everyone to do it, all you need is someone to do it and then that both becomes a competitive advantage in the market and allows any victim of official misconduct to use that one.
There is no in-band method to prevent[0] subversion- no one can design an instruction set, security scheme or laws that are immune from subversion.
0. Except the evil-bit header in IPv4. Routing equipment always drop evil packets before they reach the victims network, stopping attacks before they happen.
I'm sure that other platforms attach the same kind of info to posts. It's just a matter of scraping it.
Almost all data is spatiotemporal data, people just aren't used to thinking about it like that. Everything that "happens" is an event with associated times and places.
Tagging of events with spatiotemporal attributes, or with metadata that can be used to infer spatiotemporal attributes, is pervasive. Every system data passes through, even if not the creator of it, observes the event of the data passing through it. Event observation is not trying to track things but it implicitly and necessarily creates the data that makes tracking and spatiotemporal inference possible.
These kinds of analyses rely almost entirely on knowing the events occurred; you could encrypt the contents of the data and it wouldn't matter. Software leaks spatiotemporal event context everywhere across myriad systems, internal and external, that incidentally collect it. There isn't anything nefarious about most of it and much of it is required for reasons of criminal and civil liability.
What people underestimate is that you can analytically stitch together many unrelated sparse data sources with spatiotemporal attributes, many of which are quite crap or seemingly unfit for purpose, to reconstruct a dense high-quality graph. Counter-intuitively, diverse and seemingly irrelevant data sources often produce better data models. It surfaces bias, errors, manipulation, and processing artifacts in individual sources you might otherwise miss.
It is much more difficult to access the obvious first-party data sources than it used to be, mostly because people with that data are far more selective about who they give access. It doesn't really matter, that is a speed bump for the unsophisticated. The exponential growth in the scale and diversity of network-connected telemetry of all types pretty much guarantees these data models will always be constructible.
The historical limiter has always been the absence of data infrastructure platforms that can handle these kinds of analytics at scale.
>These kinds of analyses rely almost entirely on knowing the events occurred; you could encrypt the contents of the data and it wouldn't matter. Software leaks spatiotemporal event context everywhere across myriad systems, internal and external, that incidentally collect it. There isn't anything nefarious about most of it and much of it is required for reasons of criminal and civil liability.
>What people underestimate is that you can analytically stitch together many unrelated sparse data sources with spatiotemporal attributes, many of which are quite crap or seemingly unfit for purpose, to reconstruct a dense high-quality graph. Counter-intuitively, diverse and seemingly irrelevant data sources often produce better data models. It surfaces bias, errors, manipulation, and processing artifacts in individual sources you might otherwise miss.
That's a lot of technobabble for what essentially sounds like "there's some ad SDK that's phoning home with your gps/ip geolocation every few minutes, if you cross reference that with when flights are, you can guess what flight someone took". How far off am I? Or is there some galaxy brained AI that can infer that from disparate facts like that you stopped posting on twitter for 12 hours, your car's license plate was caught by an ALPR to be heading towards the airport, and 3 weeks ago you visited some portuguese tourism site that had an ad beacon installed?
Education starts in the home: if it's not locally runnable and useable offline, it does not exist. We need to teach people how to be sneaky kids trying to sneak all things past authoritarian parents. That mindset is what will drive otherwise lemmings to doing things like making Qubes OS primary, getting a Google tablet and installing GrapheneOS on it, building a 48 hour battery life "comms bag" which is an LTE modem (or 5G) + a good OpenWRT capable router + battery packs and charging equipment.
Idea is: baseband is divorced from application processor and packed away into a separate radio station which can be brought online completely under the owners' control. That will be my next "cell phone."
Everyone leaves telemetry breadcrumbs everywhere. Almost everyone radiates a ton of different kinds of data, RF, optical, etc fingerprints that are incidentally picked up everywhere they go. In fact, there have been interesting papers on tracking entities that don’t leave any breadcrumbs because they visibly perturb the entities that do! Existing in the world has side effects that are picked up by all kinds of systems and it is analytically possible, with sufficient large quantities of data, to start attributing those side effects to specific entities. That’s the game.
The caveat is that it requires large quantities of data. Many, many petabytes is table stakes.
What you are not grokking is that this can be done with data that is not designed to track anyone but which can reconstruct identity with sophisticated analysis across many innocuous data sources. This has been one of the unexpected headaches of GDPR; you can often identify people with boring industrial sensors that were designed for industrial purposes with sufficiently sophisticated analytics. The fact that this is possible has made this data (legally) PII.
A person that literally doesn’t own a mobile phone is still trackable. The whole “burner phone” was always a movie trope, that hasn’t worked for decades if ever.
The ability to do graph reconstruction across dirty, noisy spatiotemporal events that may have nothing to do with the target in question is robust. It isn’t trivial, but at this point it is a mature field of endeavour.
What does this mean in concrete terms? "optical, etc fingerprints" just sounds like a fancy way of saying "facial recognition". I'm also not too sure what "RF" translates to. Is it just wifi/bluetooth/celluar sniffers?
People on this site probably understand this better than 99% of the world.
The problem is "What can I, as an individual, do about it?"
Some airlines don't even allow you to check in without using their app, unless you are willing to pay for a fee.
> We filtered out all spatiotemporal edges
How were you getting spatiotemporal data?
Was this only on US data, or EU or worldwide? And when you say "ad data", do you mean like ADINT?
And when you infer "was on a plane" from spatiotemporal edges in the entity graph with an implied speed of >300 km/h and >200 km distance, does that only work if the individual social-media app itself is collecting real-time geolocation? So if you only access social media from the browser not the app, does that partially prevent this privacy exploit?
Frankly, Clear and TSA-Pre makes my life so much easier and since I don’t commit crimes I’m not very worried… just a little worried.
I hate the excuse "since I don't commit crimes". It's not about that. If they want your info that you're not directly giving them, they can get a warrant.
What if it affects your ability to get work? Have you ever made or viewed any posts that could be considered political or made comments on a political post? What agenda do you support with those actions?
Terms of service are meaningless if they keep the extent as secret as possible. Facebook has demonstrably shown this and as shocking as it is they are restrained compared to lots of companies.
Especially when you can out source the full evil to a wholly owned subsidiary for plausible deniability.
And if private corpse know something, many foreign governments know all of it.
Since then everyone carries a device that tracks gps, usage, has a microphone and camera, that can be remotely turned on.
Every site and app has a ToS that enables data collection. And there are thousands of companies collecting and aggregating and reselling.
State actors can bypass air gaps, hide in HDD firmware, and that is 10 year old capabilities.
And now they have usable AI to read the fire hose.
To do it properly, not only would you have to change all your logins and email accounts, but simultaneously start using a new computer and phone. Also, move home.
In other words: very hard to achieve. But I wonder if there is a set of achievable actions one can take that gets you to 'very good privacy'?
ARC and IATA absolutely do play such a role, as the financial clearinghouses for ensuring that travel agents (online and offline) and airlines can pay each other, and as gatekeepers/certification bodies for agencies to ensure these financial systems aren't abused.
Now, they absolutely do sell access to data to third parties, governmental and nongovernmental. But the reason they have this data isn't because they buy it to resell it; they are fully part of the funds flow for the underlying transaction. Whether they should be allowed to sell or share non-anonymized data on passenger records and prices paid is a very good question, but at the very least this is about as first-party as data gets.
https://www.altexsoft.com/blog/airline-reporting-corporation... describes some of these flows. (Here be dragons.)
Two things can be true simultaneously: (a) it is worrisome that a company is selling PII at scale to government entities who would otherwise need to request that data through accountable warrant processes, and (b) we shouldn't call every such company a "data broker" lest we dilute the specificity of that term, particularly when the companies in question participate in the funds flow of the customer transaction.
okay...
>zero cost surveillance for the big brother
How is it "free" if they are the ones funding the data brokers?
leblancfg•1d ago
jeffbee•1d ago
sofixa•1d ago
To add to this, any mention of "telemetry" is taken to mean your PII being taken by bad actors to abuse, instead of what it is in 99% of cases, which is usage statistics. (X% of our users use feature A, it merits investment). It can be both, but there's usually no place for differentiation, just pitchforks.
ctoth•1d ago
Fool me once, shame on you. Fool me 153,927,861 times, shame on me.
The place for differentiation, the place for "oh this is probably fine", the benefit of the doubt is, of course, lost.
Because someone (you? people shaped like you?) who misuse telemetry destroyed trust.
> It can be both
should instead be "it usually is both and you the user have no way to know anyway."
mvieira38•1d ago
jeffbee•1d ago
sofixa•1d ago
Answering to court orders isn't "ratting". You either answer court orders or go to prison.
aspenmayer•1d ago
https://discuss.privacyguides.net/t/privacy-pass-the-new-pro...
Apparently not Privacy Pass related, will keep looking as I seem to remember that Mullvad was doing that implementation, but I may remember incorrectly.
https://discuss.privacyguides.net/t/mullvad-has-partnered-wi...
some_random•20h ago
everdrive•1d ago
hinterlands•1d ago
If the headline is "Mark Zuckerberg is amassing your data and you know it's for evil", it's an easy sell. If it's "there's an ecosystem of little-known companies that sell transaction, location and lifestyle data to marketers, journalists, PIs, and police departments alike", it's not exactly the kind of a message that spurs people to action. And yeah, the newspaper that would be breaking the news is a customer too.
ujkhsjkdhf234•1d ago
supriyo-biswas•1d ago
jeffbee•1d ago
andrew_lettuce•1d ago
taeric•1d ago
kevin_thibedeau•1d ago
taeric•1d ago
flossposse•21h ago
taeric•9h ago
Some of this is that people are rather wrong about just how much smarter some of the big consumer tech companies are.
ck_one•1d ago
Let's say we want this dataset: Credit card line items for 35-year-old dentists living on the 400 block of Elm street in local town
How much do I have to pay you to get it?
dylan604•1d ago
Never ask a sales person how much yo have to pay when the prices are not already clearly stated. Tell them how much you are willing to spend to see if they will do it for that amount. Sales people will always shoot high hoping to not leave money on the table. The price might change depending on how much you squeal and how high they shot. Your initial "willing to spend" should also be lower than you're actually willing to spend for the same but converse reason
metamet•1d ago
dylan604•1d ago
Seems like the first thing to do would be to get an account with one of these data brokers. I'd imagine most of these places are "contact us for pricing" so they can play used car salesman games
Or, you could ask John Oliver to do it for you and then tell all of us on one of his episodes exactly how in depth it could get. They have the money to do this, and it seems like something right in his team's wheel house
some_random•21h ago
dylan604•21h ago
Just because someone doesn't answer your belligerent questions does not mean it's not possible. It probably means that the people that are doing this with first hand knowledge have too much to do than trying to convert doubting Thomas over here.
some_random•21h ago
dylan604•20h ago
some_random•20h ago
lazyasciiart•1d ago
leoqa•1d ago
dylan604•21h ago
https://datarade.ai/data-categories/food-grocery-transaction...
Have we really lost the ability to use search functionality??
andrew_lettuce•1d ago
JohnMakin•1d ago
Yea, you know everything, don't you.
some_random•21h ago
dylan604•21h ago
Let me google this for you...
https://duckduckgo.com/?q=how+to+buy+data+from+a+data+broker...
some_random•21h ago
JohnMakin•19h ago
JohnMakin•1d ago
I'll waste my own time and give a trivial example just off the top of my head. Go peruse some of the products offered on this page, put on your thinking cap or even look into them further and imagine what kind of data those services provide, where it likely comes from, and where it is sold to, and you'll be well on your way - and those are just the ones that are advertised openly.
https://www.transunion.com/business
Pretty much every one of the big players people typically associate with other areas such as personal credit have some feet in this space somewhere. Then theres the hundreds of lesser-known fly-by-night guys that have their own DB's they build off of mostly what is the same data, but correlated in different ways and sold to different audiences.
There are many, many services offering data-for-sale on practically anything to practically anyone. I heard of one recently claiming it can reliably determine someone's porn preferences. The fact you personally have never come across it, or are saying you aren't, is only a data point that is interesting to you, and no one else that actually knows what they are talking about in this space. Hope this post helps you somehow.
southernplaces7•22h ago
Okay but then why not name at least a couple such services. Also, if the tech industry isn't selling data to them, where do they obtain it? Again, I see lots of ambiguity here, and the example link from transunion is hardly revealing of anything.
dylan604•21h ago
Mobile service providers are known to have sold data. https://www.fcc.gov/document/fcc-fines-largest-wireless-carr...
Auto makers are known to sell data. https://www.caranddriver.com/news/a61711288/automakers-sold-...
You act like it doesn't happen, yet time and time again we learn about companies selling whatever data they can collect.
I can't believe we are still questioning this fact
What else do you need to know?
southernplaces7•21h ago
Anyhow, thanks for taking the time to include some links.
dylan604•20h ago
southernplaces7•20h ago
>So if you want to think that people are collecting the data and not selling it to interested parties, the, boy, I don’t know.
As I very clearly said above, I don't doubt it at all, I was just asking for any clarification on who to whom.
JohnMakin•19h ago
southernplaces7•16h ago
some_random•21h ago
lazyasciiart•17h ago
chasd00•1d ago
Melatonic•1d ago
southernplaces7•1d ago
pdoege•22h ago
It fills its data lakes with the vectorization and down tilt data that it collects every day. It uses federated batched Hadoop tasks to join the above data lakes into one large data lake. Mid-PB in size.
Then it looks for mobile phones that travel to the 400 block at night and stay there, that are buying dentist stuff from Walmart, travel to a dentist office every workday, have an income over $120k, and are a member of the local dentist society. Maybe look for someone with dentist student loans, graduated with a dental degree.
None of those data points can identify an individual. Taken together they can ID just about anybody.
But maybe there is a chance that you ID their wife/husband. So maybe include/exclude people that regularly visit OBGYN offices.
Back in the day we could link cell numbers to credit card purchases in locations to the point of being to identify the name of the person and what they purchased and where it was purchased. For all people in a metro area that were using credit cards and physically visiting stores.
criddell•1d ago
I think most people here understand that Google sells ads against that data, but they aren't selling the data.
worik•1d ago
I do not believe that. I would like evidence before I am convinced
If my bank is releasing that data I am horrified. I live in anew Zealand and our privacy laws are clear: it would be illegal
mixmastamyk•20h ago
worik•17h ago
We have strict privacy laws
That would put us ahead?
mixmastamyk•6h ago
worik•2h ago
If I collect your private information for a purpose, I may only use it for that purpose. I may not sell it
So if I have your transaction details I may use them to complete the transaction, no other purpose
flossposse•21h ago
onlyrealcuzzo•1d ago
It should not be surprising that they are selling your data for a profit...
JohnMakin•1d ago
OsrsNeedsf2P•1d ago
rapind•1d ago
seplox•1d ago
https://therecord.media/ftc-complaint-against-kochava-unseal...
Among the additional information Kochava collects and sells are non-anonymized individual home addresses, phone numbers, email addresses, gender, age, ethnicity, yearly income, “economic stability,” marital status, education level, political affiliation and “interests and behaviors,” compiling and selling dossiers on individuals marketed as offering a “360-degree perspective,” the FTC said.
...
According to the FTC, Kochava’s data can identify women who visit reproductive clinics by name and address along with, for example, when they visit particular buildings, their names, email and home addresses, number of children, race and app usage.
...
Kochava marketing materials tell customers it offers “rich geo data spanning billions of devices globally” and that its location data feed “delivers raw latitude/longitude data with volumes around 94B+ geo-transactions per month, 125 million monthly active users, and 35 million daily active users, on average observing more than 90 daily transactions per device.”
...
The complaint also alleges that the company has lax procedures for determining who it is selling data to, saying purchasers are allowed to use a generic personal email address, label an alleged company as “self” and explain they plan to use the data for “business.”
And then there's this: https://therecord.media/data-brokers-are-selling-military-se...
chasd00•1d ago
JohnMakin•1d ago
genghisjahn•1d ago
chgs•1d ago
hnlmorg•1d ago
It’s amazing how much worse things have gotten, yet how people seem to care less now than they used to.
I wonder if it’s just consumers being so overwhelmed by their lack of control that they’ve become apathetic to the problem as a whole.
tpxl•11h ago
The cc/bank provider already gets an itemized bill, and they get it for everywhere you shop as opposed to a single store (so a superset of this data is already collected). This is in some (most?) cases already shared with stores, and even if it isn't, what can a store do with it the bank/cc provider can't do worse.
hnlmorg•5h ago
Thats where the paranoid was, around the fact that your individual shopping habits were being stored.
Also we are talking about the 90s here. So cash payments were more common.
dingnuts•1d ago
codyb•1d ago
Do banks sell this information? This bill was pulled from this ATM in Georgia by one Claudius McMoneyhands, and then deposited by one CashMoneyBusiness LLC in South Carolina three weeks later
Seems like there could still be intermediaries and a lack of what you actually bought with it at least?
A4ET8a8uTh0_v2•1d ago
rustcleaner•19h ago
genghisjahn•1d ago
zahlman•21h ago
denominations, perhaps?
genghisjahn•13h ago
asdff•1d ago
const_cast•1d ago
sixothree•1d ago
If the FTC could do anything here to make this situation better, it would be to give every person access to any data about them that gets sold.
jancsika•1d ago
In Manufacturing Consent they measured column inches in the NYT-- IIRC it was something like measuring the total that support the relevant U.S. administration's official position on given policy vs. inches that went against the gov't position. In any case, they were measuring column inches.
What were you measuring to come to your conclusion?
JohnMakin•1d ago
svieira•1d ago
https://www.nytimes.com/2023/09/22/magazine/hank-asher-data....
imiric•1d ago
I'm aware that using adblockers and avoiding social media doesn't entirely prevent tracking, shadow profiles, and such, but surely it makes things more difficult for these companies, no? Or would you say that there's practically no difference between making an effort to preserve one's privacy and just giving up entirely?
testing22321•5h ago
Does any of what you’re talking about impact my life? How?
astura•1d ago
I know someone who bought the address of everyone with a specific first name.
timeon•1d ago
Where I live it is.
astura•1d ago
roadside_picnic•1d ago
If you were caught demoing something both horrific and internal you would risk serious damage to your career, and ultimately will have zero impact on the industry as there's just too much data out there and too much money wrapped up in it.
Plus, most people working with the data don't bother to look at it. The places I've internally demo'd massive privacy risks were shocked because they didn't realize what their own data was capable of. Most people are just writing jobs that run and shuffle data around from one place to another never really asking "what is this data?" Even among data scientists I'm routinely surprised (so maybe I shouldn't be surprised) how frequently data scientist never do any real error analysis by looking at what the model got wrong and trying to understand why.
Melatonic•1d ago
southernplaces7•1d ago
southernplaces7•1d ago
victorbjorklund•1d ago
roadside_picnic•1d ago
In many cases joining datasets is both labor intensive and creates a surprising amount of new information, and there is also plenty of "free" data that is incredibly tedious to work with.
I used to work with real estate data for the government and if you search for any common things you might want to know you often land on a data brokers page even though property assessor data is freely available in most counties. The problem is each county has their own system of storing data and their own process for searching it. It's a lot of work to learn how just this one dataset works, combining this for all counties in the US is a massive project.
Whenever I buy a new home I always look up all my neighbors, figure out when they bought the house, how much they paid etc. Some people get freaked out by this, but this information is public in most counties.
By joining this data with another public data set, you can actually figure out which lender your neighbors used and what their reported income at time of sale, their age and ethnic background.
Of course there are plenty of other ways data brokers come across data, but even cleaning up and joining public data can require a fair bit of time and expertise.
tonyarkles•1d ago
I am a perfect example of this. Due to a bit of a quirk in how my house got its address assigned to it in 1959, we have a unique postal code. If a data broker gets access to a list of product purchases by postal code from a retailer, that's in theory somewhat anonymized. However... if they also get a list of people-postal code mappings, they have now established exactly what products my wife and I have purchased (by virtue of us being the only two people with this postal code).
Do that across multiple retailers and they've painted an incredibly vivid picture of what exactly we do with our time.
southernplaces7•1d ago
trollied•1d ago
The general public have no idea how much ad providers and data brokers know about them.
rvnx•1d ago
lyton•1d ago
Relevant article: http://archive.today/fzUL4
rustcleaner•20h ago
bombcar•11h ago
It’s harder to do this now, but think a bit - set the net wider and catch a few more, who cares! All you care is your target gets hit.
blindriver•1d ago
I’m guessing with the help of Palantir, the government has even more data and can probably link Reddit posts etc based on styleometry and can even perform psychological analysis on your personality and tendencies, etc.
kevin_thibedeau•1d ago
worik•1d ago
After being burnt by things taken from my social media out of context, used to publicly shame me, I locked down my social media
Am I "sweetly naive" to think that had an effect? I do think it did
Before I stopped using Facebook I noticed, over the last decade, that almost every account I encountered was locked down similarly
My point is I suspect it is getting harder, not easier, for data thieves. The golden age of data theft has passed. Maybe.
slumberlust•10h ago
FB creates a shadow profile for you even if you no longer have an account. Databrokers respect legal requirements to delete your data then just populate a new profile for you.
I'd argue it's easier now than ever to get hands on someone's data in the USA.
rustcleaner•19h ago
I really need to start using PocketPal (local LLM on Android) to restate my messages.
---
Oh, the places I'd like to send my texts so fine, With PocketPal, a tool that's truly divine, Local LLM on Android, a wondrous device to see, To help me restate my messages with glee! Wheee!
greenie_beans•21h ago