Glad I found this quote. It is quite helpful for an AI to search the web on behaolf of me... even if it was finding where I can buy particular/similar peanuts locally I got from abroad.
In fact, even ads ingested by the training data set at this very moment could be useful. Go to Gemini and tell it you want to buy a jacket or whatever and it will recommend some products it ingested from the training data.
I'm a creator of such content, and like everyone else, I have to make do with 60-70% less traffic now.
We already see this with synthetic training data that basically uses logic in form of math and code as constraint.
I've heard this argument before, but you don't need to think too hard to see the limitations of a machine with no senses.
suddenly the confirmed quality of the scraped data will be at a premium.. "Scrape Engine Optimizers" ?
It's just harder when you cut all traffic to them, devalue their work and fill the air with AI noise.
Only if you assume that people who train models are stupid.
Someone in the chain will be. Even the smartest people buy a lot of their training datasets. What happens when those get contaminated?
And it's simply not reasonable for AI companies to have human hands read through individual comments everywhere from beginning to end to build their training data. There isn't enough time in the universe to advance AI while doing that and also being accurate. Something will always slip through.
And even that needs to be curated because before AI tools there was bot content filling up the internet.
...and even without bots, a lot of human authored content are low value, poorly written, etc.
There are (probably) companies out there whose business is to create, curate and improve training sets.
Probably the only real way to validate content is real is building a validation system into devices. Confirm when a photo is taken and send an ID to a server, then when photos are shared, its ID is compared to the image on the camera/phone manufacturer's server. For text, validate every little key press. And there are still ways to game these systems, but I would not be surprised if they're introduced to mitigate AI diffusing everywhere.
Which means filtering and ranking systems become the main bottleneck.
That pushes platforms toward stronger algorithmic selection and sometimes stronger convergence of attention.
Who is this official making this pronouncement?
If AI slop is replacing the content you were consuming, it was already slop.
The current one is awful, and there's so much AI/Bot content, but I can find far more detailed information using AI enabled search that isn't covered in ads. I can get an initial overview of methodology without trawling through SEO articles.
I think AI has been almost a natural response to the enshittification of the internet - ChatGPT wouldn't seem so transformative if google search was working like google search rather than ad generator 5000 before it released.
Best thing to do is to avoid idly browsing social media and curate your internet experience.
jruohonen•2h ago
theshrike79•1h ago
Your smart thermometer isn't making Reddit posts trying to sound like a human who's just concerned that the bedroom is a bit too warm.
teleforce•1h ago
If you perform simple extrapolation, the M2M data only surpass the others around 2029.
Coincidently, in the original timeline of Transformer movie, 2029 is the year that the Resistance, led by John Connor, destroyed Skynet and ended the war against the machines.
SanjayMehta•1h ago
eru•40m ago