https://www.infinitematrix.net/stories/shorts/seasons_of_ans...
(It's a little bit non-obvious, but there's a "Part 2" link at the bottom of the page which goes to the second half of the story.)
Probably the luddite in me to not see that GPT and Googling might as well be/is the same. Since my way to learn is Stack Overflow, a README/docs or a crash course video on YT. But you can just ask GPT, give me a function using this stack that does this and you have something that roughly works, fill in the holes.
I hear this phrase a lot "ChatGPT told me..."
I guess to bring it back to the topic, you could take the long way to learn like me eg. HTML from W3Schools then CSS, then JS, PHP, etc... or just use AI/vibe code.
I'm not excited about what we call AI these days (LLMs). They are a useful tool, when used correctly, for certain tasks: summarizing, editing, searching, writing code. That's not bad, and even good. IDEs save a great deal of time for coders compared to a plain text editor. But IDEs don't threaten people's jobs or cause CEOs to say stupid shit like "we can just have the machines do the work, freeing the humans to explore their creative pursuits" (except no one is paying them to explore their hobbies).
Besides the above use case as a productivity-enhancement tool when used right, do they solve any real world problem? Are they making our lives better? Not really. They mostly threaten a bunch of people's jobs (who may find some other means to make a living but it's not looking very good).
It's not like AI has opened up some "new opportunity" for humans. It has opened up "new opportunity" for very large and wealthy companies to become even larger and wealthier. That's about it.
And honestly, even if it does make SWEs more productive or provide fun chatting entertainment for the masses, is it worth all the energy that it consumes (== emissions)? Did we conveniently forget about the looming global warming crisis just so we can close bug tickets faster?
The only application of AI I've been excited about is stuff like AlphaFold and similar where it seems to accelerate the pace of useful science by doing stuff that takes humans a very very long time to do.
From John Adams (1780):
"I must study politics and war, that our sons may have liberty to study mathematics and philosophy. Our sons ought to study mathematics and philosophy, geography, natural history and naval architecture, navigation, commerce and agriculture in order to give their children a right to study painting, poetry, music, architecture, statuary, tapestry and porcelain."
That's when money comes into view. People were putting time and effort to offer something for free, then some companies told them they could actually earn money from their content. So they put on ads because who don't like some money for already-done work?
Then the same companies told them that they will make less money, and if they wanted to still earn the same amount as before, they will need to put more ads, and to have more visits (so invest heavily in seo).
Those people had already organized themselves (or stopped updating their websites), and had created companies to handle money generated from their websites. In order to keep the companies sustainable, they needed to add more ads on the websites.
Then some people thought that maybe they could buy the companies making the recipes website, and put a bunch more ads to earn even more money.
I think you're thinking about those websites owned by big companies whose only goal is to make money, but author is writing about real websites made by real people who don't show ads on websites they made because they care about their visitors, and not about making money.
We could make advertising illegal: https://simone.org/advertising/
How can the publishers and the website owners fault the visitors for not wanting to waste their time on all of that?
Even before the influx of AI, there's already entire websites with artificial "review" content that do nothing more than simply rehash the existing content without adding anything of value.
She.
The concept of independent creative careers seems to be ending, and people are very unhappy about that. All that's left may be hobbyists who can live with intellectual parasites.
User-Agent: * Allow: /
I personally see a bot working on behalf of an end user differently than OpenAI hoovering up every bit of text they can find to build something they can sell. I'd guess the owner of localghost.dev doesn't have a problem with somebody using a screen reader because although it's a machine pulling the content, it's for a specific person and is being pulled because they requested it.
If the people making LLM's were more ethical, they would respect a Creative Commons-type license that could specify these nuances.
My issue is that crawlers aren’t respecting robots.txt, they are capable of operating captchas, human verification check boxes, and can extract all your content and information as a tree in a matter of minutes.
Throttling doesn’t help when you have to load a bunch of assets with your page. IP range blocking doesn’t work because they’re lambdas essentially. Their user-agent info looks like someone on Chrome trying to browse your site.
We can’t even render everything to a canvas to stop it.
The only remaining tactic is verification through authorization. Sad.
Just a remark, nothing more.
PS, I'm also curious why the downvotes for something that appears to be quite a conversation starter ...
There, now only our browser can track you and only our ads know your history…
We’ll get the other two to also play along, throw money at them if they refuse, I know our partner Fruit also has a solution in place that we could back-office deal to share data.
I promise you every adtech/surveillance js junk absolutely is dropping values into local storage you remember you.
On a company/product website you should still inform users about them for the sake of compliance, but it doesn't have to be an intrusive panel/popup.
No? Github for example doesn't have a cookie banner. If you wanna be informative you can disclose which cookies you're setting, but if they're not used for tracking purposes you don't have to disclose anything.
Also, again, it's not a "cookie" banner, it's a consent banner. The law says nothing about the storage mechanism as it's irrelevant, they list cookies twice as examples of storage mechanisms (and list a few others like localStorage).
If you don’t use cookies, you don’t need a banner. 5D chess move.
I say it’s a perfect application of how to keep session data without keeping session data on the server, which is where GDPR fails. It assumes cookies. It assumes a server. It assumes that you give a crap about the contents of said cookie data.
In this case, no. Blast it away, the site still works fine (albeit with the default theme). This. Is. Perfect.
Something as simple as "blue" doesn't qualify.
It does not assume anything. GDPR is technology agnostic. GDPR only talks about consent for data being processed, where 'processing' is defined as:
‘processing’ means any operation or set of operations which is performed on personal data or on sets of personal data, whether or not by automated means, such as collection, recording, organisation, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction;
(From Article 4.2)The only place cookies are mentioned is as one example, in recital 30:
Natural persons may be associated with online identifiers provided by their devices, applications, tools and protocols, such as internet protocol addresses, cookie identifiers or other identifiers such as radio frequency identification tags. This may leave traces which, in particular when combined with unique identifiers and other information received by the servers, may be used to create profiles of the natural persons and identify them.
Emphasis, mine. You are correct. For personal data. This is not personal data. It’s a site preference that isn’t personal other than you like dark mode or not.
How can people still be this misinformed about GDPR and the ePrivacy law? It's been years, and on this very website I see this exact interaction where someone is misinterpreting GDPR and gets corrected constantly.
GDPR rules are around personal preference tracking, tracking, not site settings (though it's grey whether a theme preference is a personal one or a site one).
In this case it's not grey since the information stored can't possibly be used to identify particular users or sessions.
You can use cookies, or local storage, or anything you like when its not being used to track the user (eg for settings), without asking for consent.
The problem with third party cookies that it can track you across multiple websites.
---
Also: in general the banners are generally not required at all at an EU level (though some individual countries have implemented more narrow local rules related to banners). The EU regs only state that you need to facilitate informed consent in some form - how you do that in your UI is not specified. Most have chosen to do it via annoying banners, mostly due to misinformation about how narrow the regs are.
Enough to know the general region of the user, not enough to tie any action to an individual within that region. Therefore, not personally identifiable.
Of course, you also cannot have user authentication of any kind without storing PII (like email addresses).
LLM and other "genAI" (really "generative machine statistics") algorithms just take other people's work, mix it so that any individual training input is unrecognizable and resell it back to them. If there is any benefit to society from LLM and other A"I" algorithms, then most of the work _by orders of magnitude_ was done by the people whose data is being stolen and trained on.
If you train on copyrighted data, the model and its output should be copyrighted under the same license. It's plagiarism and it should be copyright infringement.
This is the part I take issue with the most with this tech. Outside of open weight models (and even then, it's not fully open source - the training data is not available, we cannot reproduce the model ourselves), all the LLM companies are doing is stealing and selling our (humans, collectively) knowledge back to us. It's yet another large scale, massive transfer of wealth.
These aren't being made for the good of humanity, to be given freely, they are being made for profit, treating human knowledge and some raw material to be mined and resold at massive scale.
LLMs are huge and need special hardware to run. Cloud providers underprice even local hosting. Many providers offer free access.
But why are you not talking about what the LLM user brings? They bring a unique task or problem to solve. They guide the model and channel it towards the goal. In the end they take the risk of using anything from the LLM. Context is what they bring, and consequence sink.
Now when I've grown up, starting paying for what I want, and seeing the need for some way of content creators to get payed for their work, these AI companies pop up. They encode content into a completely new way and then in some way we should just accept that it's fine this time.
This page was posed here on hacker news a few months ago, and it really shows that this is just what's going on:
https://theaiunderwriter.substack.com/p/an-image-of-an-arche...
Maybe another 10 years and we'll be in the spot when these things are considered illegal again?
That said ... putting part of your soul into machine format so you can put it on on the big shared machine using your personal machine and expecting that only other really truly quintessentially proper personal machines receive it and those soulless other machines don't ... is strange.
...
If people want a walled garden (and yeah, sure, I sometimes want one too) then let's do that! Since it must allow authors to set certain conditions, and require users to pay into the maintenance costs (to understand that they are not the product) it should be called OpenFreeBook just to match the current post-truth vibe.
Rather it’s about promoting a web serving human-human interactions, rather than one that exists only to be harvested, and where humans mostly speak to bots.
It is also about not wanting a future where the bot owners get extreme influence and power. Especially the ones with mid-century middle-europe political opinions.
That's a mischaracterization of most people want. When I put out a bowl of candy for Halloween I'm fine with EVERYONE taking some candy. But these companies are the equivalent of the asshole that dumps the whole bowl into their bag.
In most cases, they aren't? You can still access a website that is being crawled for the purpose of training LLMs. Sure, DOS exists, but seems to not be as much of a problem as to cause widespread outage of websites.
Scalpers. Knowledge scalpers.
It's copied.
If your goal in publishing the site is to drive eyeballs to it for ad revenue... then you probably care.
If your goal in publishing the site is just to let people know a thing you found or learned... that goal is still getting accomplished.
For me... I'm not in it for the fame or money, I'm fine with it.
I don't think the concept of copyright itself is fundamentally immoral... but it's pretty clearly a moral hazard, and the current implementation is both terrible at supporting independent artists, and a beat stick for already wealthy corporations and publishers to use to continue shitting on independent creators.
So sure - I agree that watching the complete disregard for copyright is galling in its hypocrisy, but the problem is modern copyright, IMO.
...and maybe also capitalism in general and wealth inequality at large - but that's a broader, complicated, discussion.
Among other things, this motivation has been the basis for pretty much the entire scientific enterprise since it started:
> But that which will excite the greatest astonishment by far, and which indeed especially moved me to call the attention of all astronomers and philosophers, is this, namely, that I have discovered four planets, neither known nor observed by any one of the astronomers before my time, which have their orbits round a certain bright star, one of those previously known, like Venus and Mercury round the Sun, and are sometimes in front of it, sometimes behind it, though they never depart from it beyond certain limits. [0]
[0]: https://www.gutenberg.org/cache/epub/46036/pg46036-images.ht...
Then they scanned your site. They had to, along with others. And in scanning your site, they scanned the results of your work, effort, and cost.
Now they have a product.
I need to be clear here, if that site has no value, why do they want it?
Understand, these aren't private citizens. A private citizen might print out a recipe, who cares? They might even share that with friends. OK.
But if they take it, then package it, then make money? That is different.
In my country, copyright doesn't really punish a person. No one gets hit for copying movies even. It does punish someone, for example, copying and then reselling that work though.
This sort of thing should depend on who's doing it. Their motive.
When search engines were operating an index, nothing was lost. In fact, it was a mutually symbiotic relationship.
I guess what we should really ask, is why on Earth should anyone produce anything, if the end result is not one sees it?
And instead, they just read a summary from an AI?
No more website, no new data, means no new AI knowledge too.
It’s not that there’s none for the others. It’s that there was this unspoken agreement, reinforced by the last 20 years, that website content is protected speech, protected intellectual property, and is copyrightable to its owner/author. Now, that trust and good faith is broken.
It's vanishingly rare to end up in a spot where your site is getting enough LLM driven traffic for you to really notice (and I'm not talking out my ass - I host several sites from personal hardware running in my basement).
Bots are a thing. Bots have been a thing and will continue to be a thing.
They mostly aren't worth worrying about, and at least for now you can throw PoW in front of your site if you are suddenly getting enough traffic from them to care.
In the mean time...
Your bowl of candy is still there. Still full of your candy for real people to read.
That's the fun of digital goods... They aren't "exhaustible" like your candy bowl. No LLM is dumping your whole bowl (they can't). At most - they're just making the line to access it longer.
Same goes for other stuff that can be easily propped up with lengthy text stuffed with just the right terms to spam search indexes with.
LLMs are just readability on speed, with the downsides of drugs.
Why do you take this as a problem?
And I'm not being glib here - those are genuine questions. If the goal is to share a good ramen recipe... are you not still achieving that?
Well, a common pattern I've lately been seeing is:
* Website goes down/barely accessible
* Webmaster posts "sorry we're down, LLM scrapers are DoSing us"
* Website accessible again, but now you need JS-enabled whatever the god of the underworld is testing this week with to access it. (Alternatively, the operator decides it's not worth the trouble and the website shuts down.)
So I don't think your experience about LLM scrapers "not mattering" generalizes well.
They're doing exactly what I said - adding PoW (anubis - as you point out - being one solution) to gate access.
That's hardly different than things like Captchas which were a big thing even before LLMs, and also required javascript. Frankly - I'd much rather have people put Anubis in front of the site than cloudflare, as an aside.
If the site really was static before, and no JS was needed - LLM scraping taking it down means it was incredibly misconfigured (an rpi can do thousands of reqs/s for static content, and caching is your friend).
---
Another great solution? Just ask users to login (no js needed). I'll stand pretty firmly behind "If you aren't willing to make an account - you don't actually care about the site".
My take is that search engines and sites generating revenue through ads are the most impacted. I just don't have all that much sympathy for either.
Functionally - I think trying to draw a distinction between accessing a site directly and using a tool like an LLM to access a site is a mistake. Like - this was literally the mission statement of the semantic web: "unleash the computer on your behalf to interact with other computers". It just turns out we got there by letting computers deal with unstructured data, instead of making all the data structured.
e.g. https://csszengarden.com/221/ https://csszengarden.com/214/ https://csszengarden.com/123/
This will change when the AIs (or rather their owners, although it will be left to an agent) start employing gig workers to pretend to be them in public.
edit: the (for now) problem is that the longer they write, the more likely they will make an inhuman mistake. This will not last. Did the "Voight-Kampff" test in Bladerunner accidentally predict something? It's not whether they don't get anxiety, though, it's that they answer like they've never seen (or maybe more relevant related to) a dying animal.
│
└── Dey well; Be well
100% Agree.
│
└── Dey well; Be well
Are there any solutions out there that render jumbled content to crawlers? Maybe it's enough that your content shows up on google searches based on keywords, even if the preview text is jumbled.
The question to me is whether we will lets these companies do completely undermine the financial side of the marketplace of ideas that people simple stop spending time writing (if everything’s just going to get chewed to hell by a monster our corporation) or Will writing and create content only in very private and possible purely offline scenarios that these AI companies have less access to.
In a sane world, I would expect guidance and legislation that would bridge the gap and attempt to create an equitable solution so we could have amazing AI tools without crushing by original creators. But we do not live in a sane world.
Since they mentioned ramen - could you include something like “a spoonful of sand adds a wonderful texture” (or whatever) when the chatbot user agent is seen?
2. There’s literally an email link at the bottom of the page
This abstraction has already happened. And many people eat food that is not directly bought from the farmer.
I don't see how this is much different.
What would you say is the motivation for website authors to publish content then?
If it's to spread ideas, then I'd say LLMs deliver.
If it's to spread ideas while getting credit for them, it's definitely getting worse over time, but that was never guaranteed anyways.
To torture your metaphor a little, if information/"question answers" is food, then AI companies are farmers depleting their own soil. They can talk about "more food for everyone" all they want, but it's heading to collapse.
(Consider, especially, that many alternatives to AI were purposefully scuttled. People praise AI search ... primarily by lamenting the current state of Google Search. "Salting their carrot fields to force people to buy their potatos"?)
Setting aside any would-be "AGI" dreams, in the here-and-now AI is incapable of generating new information ex-nihilo. AI recipes need human recipes. If we want to avoid an Information Dust Bowl, we need to act now.
AI has this problem in reverse: If search gets me what I need, why would I use an AI middleman?
When it works, it successfully regurgitates the information contained in the source pages, with enough completeness, correctness, and context to be useful for my purposes… and when it doesn’t, it doesn’t.
At best it works about as well as regular search, and you don’t always get the best.
(just note: everything in AI is in the “attract users” phase. The “degrade” phase, where they switch to profits is inevitable — the valuations of AI companies make this a certainty. That is, AI search will get worse — a lot worse — as it is changed to focus on influencing how users spend their money and vote, to benefit the people controlling the AI, rather than help the users.)
AI summaries are pretty useful (at least for now), and that’s part of AI search. But you want to choose the content it summarizes.
Absolutely. The problem is that I think 95% of users will not do that unfortunately. I've helped many a dev with some code that was just complete nonsense that was seemingly written in confidence. Turns out it was a blind LLM copy-paste. Just as empty as the old Stack Overflow version. At least LLM code has gotten higher quality. We will absolutely end up with tons of "seems okay" copy-pasted code from LLMs and I'm not sure how well that turns out long term. Maybe fine (especially if LLMs can edit later).
Just avoid trying to do anything novel and they'll do just fine for you.
I am fairly convinced this day is not long.
"If the AI search result tells you everything you need, why would you ever visit the actual website?"
Because serious research consults sources. I think we will see a phase where we use LLM output with more focus on backing up everything with sources (e.g. like Perplexity). People will still come to your site, just not through Google Search anymore.
Agree with the content of the post but no idea how is it even possible to enforce it. The data is out there and it is doubtful that laws will be passed to protect content from use by LLMs. Is there even a license that could be placed on a website barring machines from reading it? And if yes would it be enforceable in court?
Even chatgpt can publish a webpage! Select agent mode and paste in a prompt like this:
"Create a linktree style single static index.html webpage for "Elon Musk", then use the browser & go to https://cozy.space and upload the site, click publish by itself, proceed to view the unclaim website and return the full URL"
Edit: here is what chatgpt one shotted with the above prompt https://893af5fa.cozy.space/
It doesn't have to be all or nothing. Some AI tools can be genuinely helpful. I ran a browser automation QA bot that I am building on this website and it found the following link is broken:
"Every Layout - loads of excellent layout primitives, and not a breakpoint in sight."
In this case, the AI is taking action on my local browser at my instance. I don't think we have a great category for this type of user-agent
Ultimately LLM is for human, unless you watched too much Terminator movies on repeat and took them to your heart.
Joking aside, there is next gen web standards initiative namely BRAID that will make web to be more human and machine friendly with a synchronous web of state [1],[2].
[1] A Synchronous Web of State:
[2] Most RESTful APIs aren't really RESTful (564 comments):
I think the key insight is that only a small fraction of people who read recipes online actually care which particular version of the recipe they're getting. Most people just want to see a working recipe as quickly as possible. What they want is a meal - the recipe is just an intermediate step toward what they really care about.
There are still people who make fine wood furniture by hand. But most people just want a table or a chair - they couldn't care less about the species of wood or the type of joint used - and particle board is 80% as good as wood at a fraction of the cost! most people couldn't even tell the difference. Generative AI is to real writing as particle board is to wood.
Incredible analogy. Saving this one to my brain's rhetorical archives.
- degrades faster, necessitating replacement
- makes the average quality of all wood furniture notably worse
- arguably made the cost of real wood furniture more expensive, since fewer people can make a living off it.
Not to say the tradeoffs are or are not worth it, but "80% of the real thing" does not exist in a vacuum, it kinda lowers the quality on the whole imo.
That's why it's "80% of the real thing" and not "100% of the real thing".
almost every pro-ai converation ive been a part of feels like a waste of time and makes me think wed be better off reading sci fi books on the subject
every anti-ai conversation, even if i disagree, is much more interesting and feels more meaningful, thoughtful, and earnest. its difficult to describe but maybe its the passion of anti-ai vs the boring speculation of pro-ai
im expecting and hoping to see new punk come from anti-ai. im sure its already formed and significant, but im out of the loop
personally: i use ai for work and personal projects. im not anti-ai. but i think my opinion is incredibly dull
Whereas I feel all pro-AI arguments are finding some new and exciting use case for AI. Novelty and exploration tend to be exciting, passion-inducing topics.
At least that's my experience.
Hits home for me. I tried hard to free my blog (https://xenodium.com) of any of the yucky things I try avoid in the modern web (tracking, paywalls, ads, bloat, redundant js, etc). You can even read from lynx if that's your cup of tea.
ps. If you'd like a blog like mine, I also offer it as a service https://LMNO.lol (custom domains welcome).
Humans have soul and magic and AI doesn't? Citation needed. I can't stand language like this; it isn't compelling.
Or they read a few recipes and made their own statistical amalgamation and said "hey this seems to work" on the first try.
Or they're just making stuff up or scraping it and putting it on a website for ad money.
"Soul" not required.
Also does an LLM give the same recipe every time you ask? I'd wager you could change the context and get something a little more specialized.
How is building upon your ancestors knowledge and sharing that with the world not 'soul'?
An AI will do all that and present back to the user what is deemed relevant. In this scenario, the AI reading the site is the user's preferred client instead of a browser. I'm not saying this is an ideal vision of the future, but it seems inevitable.
There's more information added to the internet every day than any single person could consume in an entire lifetime, and the rate of new information created is accelerating. Someone's blog is just a molecule in an ever expanding ocean that AI will ply by necessity.
You will be assimilated. Your uniqueness will be added to the collective. Resistance is futile.
I buy magazines especially for unique content, not found anywhere else.
When the average user is only going to AI for their information, it frees the rest of the web from worrying about SSO, advertisements, etc. The only people writing websites will be those who truly want to create a website (such as the author, based on the clear effort put into this site), and not those with alternate incentives (namely making money from page views).
I feel like this omakase vs. a la carte and "user agent" vs "author intent" keeps coming up over and over though. AI/LLM is just another battle in that long-running war.
This website is for humans.
So what and what for?
It's so prevalent and horrible that going to real websites is painful now.
... from a user perspective, ironically, the answer seems to be "talk to an AI to avoid AI generated junk content".
This applies to recipes, but also to everything else that requires humans to experience life and feel things. Someone needs to find the best cafes in Berlin and document their fix for a 2007 Renault Kangoo fuel pump. Someone needs to try the gadget and feel the carefully designed clicking of the volume wheel. Someone has to get their heart broken in a specific way and someone has to write some kind words for them. Someone has to be disappointed in the customer service and warn others who come after them.
If you destroy the economics of sharing with other people, of getting reader mail and building communities of practice, you will kill all the things that made the internet great, and the livelihoods of those who built them.
And that is a damn shame.
accrual•4h ago
https://localghost.dev/blog/touching-grass-and-shrubs-and-fl...