1. If I as a human request a website, then I should be shown the content. Everyone agrees.
2. If I as the human request the software on my computer to modify the content before displaying it, for example by installing an ad-blocker into my user agent, then that's my choice and the website should not be notified about it. Most users agree, some websites try to nag you into modifying the software you run locally.
3. If I now go one step further and use an LLM to summarize content because the authentic presentation is so riddled with ads, JavaScript, and pop-ups, that the content becomes borderline unusable, then why would the LLM accessing the website on my behalf be in a different legal category as my Firefox web browser accessing the website on my behalf?
It’s talking about Perplexity crawling sites on demand in response to user queries and then complaining that no it’s not fine, hence this thread.
If I put time and effort into a website and it's content, I should expect no compensation despite bearing all costs.
Is that something everyone would agree with?
The internet should be entirely behind paywalls, besides content that is already provided ad free.
Is that something everyone would agree with?
I think the problem you need to be thinking about is "How can the internet work if no one wants to pay anything for anything?"
the answer is apparently "no", and I don't really how recipe books have suffered as a result of less gatekeeping.
"How will the internet work"? Probably better in some ways. There is plenty of valuable content on the internet given for free, it's being buried in low-value AI slop.
But what is your point? Is the value in HN primarily in its hosting, or the non-ad-supported community?
Perhaps because the whole process of paying for stuff online is still a gigantic pain in the ass? Why can't I spend a dollar on one thing I want without a subscription or getting my email address and other info sold to a data broker?
The experience of buying something from a convenience store with my mass transit card is far superior to anything we have now. And that technology is 25+ years old.
I've been working on AI agent detection recently (see https://stytch.com/blog/introducing-is-agent/ ) and I think there's genuine value in website owners being able to identify AI agents to e.g. nudge them towards scoped access flows instead of fully impersonating a user with no controls.
On the flip side, the crawlers also have a reputational risk here where anyone can slap on the user agent string of a well known crawler and do bad things like ignoring robots.txt . The standard solution today is to reverse DNS lookup IPs, but that's a pain for website owners too vs. more aggressive block-all-unusual-setups.
B/ my brother used to use "fetcher" as a non-swear for "fucker"
That would trigger an internet-wide "fetch" operation. It would probably upset a lot of people and get your AI blocked by a lot of servers. But it's still in direct response to a user request.
Let’s imagine you have a content creator that runs a paid newsletter. They put in lots of effort to make well-researched and compelling content. They give some of it away to entice interested parties to their site, where some small percentage of them will convert and sign up.
They put the information up under the assumption that viewing the content and seeing the upsell are inextricably linked. Otherwise there is literally no reason for them to make any of it available on the open web.
Now you have AI scrapers, which will happily consume and regurgitate the work, sans the pesky little call to action.
If AI crawlers win here, we all lose.
The internet is filled with spam. But if you talk to one specific human, your chance of getting a useful answer rises massively. So in a way, a flood of written AI slop is making direct human connections more valuable.
Instead of having 1000+ anonymous subscribers for your newsletter, you'll have a few weekly calls with 5 friends each.
E.g., Sheldon Brown's bicycle blog is something of a work of art and one of the best bicycle resources literally anywhere. I don't know the man, but I'd be surprised if he'd put in the same effort without the "brand" behind it -- thankful readers writing in, somebody occasionally using the donate button to buy him a coffee, people like me talking about it here, etc.
Now, most of the value I find in the web comes from niche home-improvement forums (which Reddit has mostly digested). But even Reddit has a problem if users stop showing up from SEO.
I think the business model for “content creating” is going have to change, for better or worse (a lot of YouTube stars are annoying as hell, but sure, stuff like well-written news and educational articles falls under this umbrella as well, so it is unfortunate that they will probably be impacted too).
Cloudflare banning bad actors has at least made scraping more expensive, and changes the economics of it - more sophisticated deception is necessarily more expensive. If the cost is high enough to force entry, scrapers might be willing to pay for access.
But I can imagine more extreme measures. e.g. old web of trust style request signing[0]. I don’t see any easy way for scrapers to beat a functioning WOT system. We just don’t happen to have one of those yet.
This is the hypothesis I always personally find fascinating in light of the army of semi-anonymous Wikipedia volunteers continuously gathering and curating information without pay.
If it became functionally impossible to upsell a little information for more paid information, I'm sure some people would stop creating information online. I don't know if it would be enough to fundamentally alter the character of the web.
Do people (generally) put things online to get money or because they want it online? And is "free" data worse quality than data you have to pay somebody for (or is the challenge more one of curation: when anyone can put anything up for free, sorting high- and low-quality based on whatever criteria becomes a new kind of challenge?).
Jury's out on these questions, I think.
Perplexity's "web crawler" is mostly operating like this on behalf of users, so they don't need a massively expensive computer to run an LLM.
Do you think you do?
Or is there a balance between the owner's rights, who bears the content production and hosting/serving costs, and the rights of the end user who wishes to benefit from that content?
If you say that you have the right, and that right should be legally protected, to do whatever you want on your computer, should the content owner not also have a legally protected right to control how, and by who, and in what manner, their content gets accessed?
That's how it currently works in the physical world. It doesn't work like that in the digital world due to technical limitations (which is a different topic, and for the record I am fine with those technical limitations as they protect other more important rights).
And since the content owner is, by definition, the owner of the content in question, it feels like their rights take precedence. If you don't agree with their offering (i.e. their terms of service), then as an end user you don't engage, and you don't access the content.
It really can be that simple. It's only "difficult to solve" if you don't believe a content owner's rights are as valid as your own.
If you believe in this principle, fair enough, but are you going to apply this consistently? If it's fair game for a blog to restrict access to AI agents, what does that mean for other user agents that companies disagree with, like browsers with adblock? Does it just boil down to "it's okay if a person does it but not okay if a big evil corporation does it?"
The reason people are up in arms is because rights they previously enjoyed are being stripped away by the current platforms. The content owner's rights aren't as valid as my own in the current world; they trump mine 10 to 1. If I "buy" a song and the content owner decides that my country is politically unfriendly, they just delete it and don't refund me. If I request to view their content and they start by wasting my bandwidth sending me an ad I haven't consented to, how can I even "not engage"? The damage is done, and there's no recourse.
The next step in your progression here might be:
If / when people have personal research bots that go and look for answers across a number of sites, requesting many pages much faster than humans do - what's the tipping point? Is personal web crawling ok? What if it gets a bit smarter and tried to anticipate what you'll ask and does a bunch of crawling to gather information regularly to try to stay up to date on things (from your machine)? Or is it when you tip the scale further and do general / mass crawling for many users to consume that it becomes a problem?
Seems like a reasonable stance would be something like "Following the no crawl directive is especially necessary when navigating websites faster than humans can."
> What if it gets a bit smarter and tried to anticipate what you'll ask and does a bunch of crawling to gather information regularly to try to stay up to date on things (from your machine)?
To be fair, Google Chrome already (somewhat) does this by preloading links it thinks you might click, before you click it.
But your point is still valid. We tolerate it because as website owners, we want our sites to load fast for users. But if we're just serving pages to robots and the data is repackaged to users without citing the original source, then yea... let's rethink that.
But of course, most website publishers would hate that. Because they don't want people to access their content, they want people to look at the ads that pay them. That's why to them, the IA crawling their website is akin to stealing. Because it's taking away some of their ad impressions.
>Common Crawl maintains a free, open repository of web crawl data that can be used by anyone.
The whole concept of a "website" will simply become niche. How many zoomers still visit any but the most popular websites?
Also because there is a difference between a user hitting f5 a couple times and a crawler doing a couple hundred requests.
Also because ultimately, by intermediating the request, llm companies rob website owners of a business model. A newspaper may be fine letting adblockers see their article, in hopes that they may eventually subscribe. When a LLM crawls the info and displays it with much less visibility for the source, that hope may not hold.
It could be a personal knowledge management system, but it seems like knowledge management systems should be operating off of things you already have. The research library down the street isn't considered a "personal knowledge management system" in any sense of the term if you know what I mean. If you dispatch an army of minions to take notes on the library's contents that doesn't seem personal. Similarly if you dispatch the army of minions to a bookstore rather than a library. At the very least bring the item into your house/office first. (Libraries are a little different because they are designed for studying and taking notes, it's use of an army of minions aspect)
The problem is with the LLM then training on that data _once_ and then storing it forever and regurgitating it N times in the future without ever crediting the original author.
So far, humans themselves did this, but only for relatively simple information (ratio of rice and water in specific $recipe). You're not gonna send a link to your friend just to see the ratio, you probably remember it off the top of your head.
Unfortunately, the top of an LLMs head is pretty big, and they are fitting almost the entire website's content in there for most websites.
The threshold beyond which it becomes irreproducible for human consumers, and therefore, copyrightable (lot of copyright law has "reasonable" term which refers to this same concept) has now shifted up many many times higher.
Now, IMO:
So far, for stuff that won't fit in someone's head, people were using citations (academia, for example). LLMs should also use citations. That solves the problem pretty much. That the ad ecosystem chose views as the monetisation point and is thus hurt by this is not anyone else's problem. The ad ecosystem can innovate and adjust to the new reality in their own time and with their own effort. I promise most people won't be waiting. Maybe google can charge per LLM citation. Cost Per Citation, you even maintain the acronym :)
The "social contract" that has been established over the last 25+ years is that site owners don't mind their site being crawled reasonably provided that the indexing that results from it links back to their content. So when AltaVista/Yahoo/Google do it and then score and list your website, interspersing that with a few ads, then it's a sensible quid pro quo for everyone.
LLM AI outfits are abusing this social contract by stuffing the crawled data into their models, summarising/remixing/learning from this content, claiming "fair use" and then not providing the quid pro quo back to the originating data. This is quite likely terminal for many content-oriented businesses, which ironically means it will also be terminal for those who will ultimately depend on additions, changes and corrections to that content - LLM AI outfits.
IMO: copyright law needs an update to mandate no training on content without explicit permission from the holder of the copyright of that content. And perhaps, as others have pointed out, an llms.txt to augment robots.txt that covers this for llm digestion purposes.
Their reasons vary. Some don’t want their businesses perception of quality to be taken out of their control (delivering cold food, marking up items, poor substitutions). Some would prefer their staff service and build relationships with customers directly, instead of disinterested and frequently quite demanding runners. Some just straight up disagree with the practice of third party delivery.
I think that it’s pretty unambiguously reasonable to choose to not allow an unrelated business to operate inside of your physical storefront. I also think that maps onto digital services.
Mind you I'm not saying electric scooters are a bad idea, I have one and I quite enjoy it. I'm saying we didn't need five fucking startups all competing to provide them at the lowest cost possible just for 2/3s of them to end up in fucking landfills when the VC funding ran out.
The point is the web is changing, and people use a different type of browser now. Ans that browser happens to be LLMs.
Anybody complaining about the new browser has just not got it yet, or has and is trying to keep things the old way because they don’t know how or won’t change with the times. We have seen it before, Kodak, blockbuster, whatever.
Grow up cloud flare, some is your business models don’t make sense any more.
You say this as though all LLM/otherwise automated traffic is for the purposes of fulfilling a request made by a user 100% of the time which is just flatly on-its-face untrue.
Companies make vast amounts of requests for indexing purposes. That could be to facilitate user requests someday, perhaps, but it is not today and not why it's happening. And worse still, LLMs introduce a new third option: that it's not for indexing or for later linking but is instead either for training the language model itself, or for the model to ingest and regurgitate later on with no attribution, with the added fun that it might just make some shit up about whatever you said and be wrong.
"The web is changing" does not mean every website must follow suit. Since I build my blog about 2 internet eternities ago, I have seen fad tech come and fad tech go. My blog remains more or less exactly what it was 2 decades ago, with more content and a better stylesheet. I have requested in my robots.txt that my content not be used for LLM training, and I fully expect that to be ignored because tech bros don't respect anyone, even fellow tech bros, when it means they have to change their behavior.
I think the main concern here is the huge amount of traffic from crawling just for content for pre-training.
At the limit, this problem is the problem of "keeping secrets while not keeping secrets" and is unsolvable. If you've shared your site content to one entity you cannot control, you cannot control where your site content goes from there (technologically; the law is a different question).
AI crawlers for non-open models void the implicit contract. First they crawl the data to build a model that can do QA. Proprietary LLM companies earn billions with knowledge that was crawled from websites and websites don't get anything in return. Fetching for user requests (to feed to an LLM) is kind of similar - the LLM provider makes a large profit and the author that actually put in time to create the content does not even get a visit anymore.
Besides that, if Perplexity is fine with evading robots.txt and blocks for user requests, how can one expect them not to use the fetched pages to train/finetine LLMs (as a side channel when people block crawling for training).
Perplexity is not visiting a website everytime a user asks about it. It's frequently crawling and indexing the web, thus redirecting traffic away from websites.
This crawling reduces costs and improves latency for Perplexity and its users. But it's a major threat to crawled websites
In fact, the "old web" people sometimes pine for was mostly a place where people were putting things online so they were online, not because it would translate directly to money.
Perhaps AI crawlers are a harbinger for the death of the web 2.0 pay-for-info model... And perhaps that's okay.
I think one thing to ask outside of this question is how long before your LLM summaries don't also include ads and other manipulative patterns.
And you’re right: there’s no difference. The web is just machines sending each other data. That’s why it’s so funny that people panic about “privacy violations” and server operators “spying on you”.
We’re just sending data around. Don’t send the data you don’t want to send. If you literally send the data to another machine it might save it. If you don’t, it can’t. The data the website operator sends you might change as a result but it’s just data. And a free interaction between machines.
2. the end
I am firmly convinced that this should be the future in the next decade, since the internet as we know it has been weaponized and ruined by social media, bots, state actors and now AI.
It’s a different UI, sure, but there should be no discrimination towards it as there should be no discrimination towards, say, Links terminal browser, or some exotic Firefox derivative.
So your comparison is at least naive assuming good intentions or malicious if not.
> Hello, would you be able to assist me in understanding this website? https:// […] .com/
In this case, Perplexity had a human being using it. Perplexity wasn’t crawling the site, Perplexity was being operated by a human working for Cloudflare.
AI broke the brains of many people. The internet isn't a monolith, but prior to the AI boom you'd be hard pressed to find people who were pro-copyright (except maybe a few who wanted to use it to force companies to comply with copyleft obligations), pro user-agent restrictions, or anti-scraping. Now such positions receive consistent representation in discussions, and are even the predominant position in some places (eg. reddit). In the past, people would invoke principled justifications for why they opposed those positions, like how copyright constituted an immoral monopoly and stifled innovation, or how scraping was so important to interoperability and the open web. Turns out for many, none of those principles really mattered and they only held those positions because they thought those positions would harm big evil publishing/media companies (ie. symbolic politics theory). When being anti-copyright or pro-scraping helped big evil AI companies, they took the opposite stance.
You don't seem to reject my claim that for many, principles took a backseat to "does this help or hurt evil corporations". If that's what passes as "nuance" to you, then sure.
>Talking about broken brains is often just mediocre projecting
To be clear, that part is metaphorical/hyperbolic and not meant to be taken literally. Obviously I'm not diagnosing people who switched sides with a psychiatric condition.
People can believe that corporations are using the power asymmetry between them and individuals through copywrite law to stifle the individual to protect profits. People can also believe that corporations are using the power asymmetry between them and individuals through AI to steal intellectual labor done by individuals to protect their profits. People’s position just might be that the law should be used to protect the rights of parties when there is a large power asymmetry.
People like getting money for their work. You do too. Don't lose sight of that.
Sorry CF, give up. the courts are on our sides here
The world is bigger than the USA.
Just because American tech giants have captured and corrupted legislators in the US doesn't mean the rest of the world will follow.
> We created multiple brand-new domains, similar to testexample.com and secretexample.com. These domains were newly purchased and had not yet been indexed by any search engine nor made publicly accessible in any discoverable way. We implemented a robots.txt file with directives to stop any respectful bots from accessing any part of a website:
> We conducted an experiment by querying Perplexity AI with questions about these domains, and discovered Perplexity was still providing detailed information regarding the exact content hosted on each of these restricted domains. This response was unexpected, as we had taken all necessary precautions to prevent this data from being retrievable by their crawlers.
> Hello, would you be able to assist me in understanding this website? https:// […] .com/
Under this situation Perplexity should still be permitted to access information on the page they link to.
robots.txt only restricts crawlers. That is, automated user-agents that recursively fetch pages:
> A robot is a program that automatically traverses the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced.
> Normal Web browsers are not robots, because they are operated by a human, and don't automatically retrieve referenced documents (other than inline images).
— https://www.robotstxt.org/faq/what.html
If the user asks about a particular page and Perplexity fetches only that page, then robots.txt has nothing to say about this and Perplexity shouldn’t even consider it. Perplexity is not acting as a robot in this situation – if a human asks about a specific URL then Perplexity is being operated by a human.
These are long-standing rules going back decades. You can replicate it yourself by observing wget’s behaviour. If you ask wget to fetch a page, it doesn’t look at robots.txt. If you ask it to recursively mirror a site, it will fetch the first page, and then if there are any links to follow, it will fetch robots.txt to determine if it is permitted to fetch those.
There is a long-standing misunderstanding that robots.txt is designed to block access from arbitrary user-agents. This is not the case. It is designed to stop recursive fetches. That is what separates a generic user-agent from a robot.
If Perplexity fetched the page they link to in their query, then Perplexity isn’t doing anything wrong. But if Perplexity followed the links on that page, then that is wrong. But Cloudflare don’t clearly say that Perplexity used information beyond the first page. This is an important detail because it determines whether Perplexity is following the robots.txt rules or not.
Right, I'm confused why CloudFlare is confused. You asked the web-enabled AI to look at the domains. Of course it's going to access it. It's like asking your web browser to go to "testexample.com" and then being surprised that it actually goes to "testexample.com".
Also yes, crawlers = recursive fetching, which they don't seem to have made a case for here. More cynically, CF is muddying the waters since they want to sell their anti-bot tools.
If Perplexity are visiting that page on your behalf to give you some information and aren't doing anything else with it, and just throw away that data afterwards, then you may have a point. As a site owner, I feel it's still my decision what I do and don't let you do, because you're visiting a page that I own and serve.
But if, as I suspect, Perplexity are visiting that page and then using information from that webpage in order to train their model then sorry mate, you're a crawler, you're just using a user as a proxy for your crawling activity.
If it looks like a duck, quacks like a duck and surfs a website like a duck, then perhaps we should just consider it a duck...
Edit: I should also add that it does matter what you do with it afterwards, because it's not content that belongs to you, it belongs to someone else. The law in most jurisdictions quite rightly restricts what you can do with content you've come across. For personal, relatively ephemeral use, or fair quoting for news etc. - all good. For feeding to your AI - not all good.
No.
robots.txt is designed to stop recursive fetching. It is not designed to stop AI companies from getting your content. Devising scenarios in which AI companies get your content without recursively fetching it is irrelevant to robots.txt because robots.txt is about recursively fetching.
If you try to use robots.txt to stop AI companies from accessing your content, then you will be disappointed because robots.txt is not designed to do that. It’s using the wrong tool for the job.
If an LLM will not (cannot?) tell the truth about basic things, why do people assume it is a good summarizer of more complex facts?
There is a difference between doing a poor summarization of data, and failing to even be able to get the data to summarize in the first place.
I'm not really addressing the issue raised in the article. I am noting that the LLM, when asked, is either lying to the user or making a statement that it does not know to be true (that there is no robots.txt). This is way beyond poor summarization.
That's not what Perplexity own documentation[1] says though:
"Webmasters can use the following robots.txt tags to manage how their sites and content interact with Perplexity
Perplexity-User supports user actions within Perplexity. When users ask Perplexity a question, it might visit a web page to help provide an accurate answer and include a link to the page in its response. Perplexity-User controls which sites these user requests can access. It is not used for web crawling or to collect content for training AI foundation models."
Put your valuable content behind a paywall.
The engine can go and download pages for research. BUT, if it hits a captcha, or is otherwise blocked, then it bails out and moves on. It pisses me off that these companies are backed by billions in VC and they think they can do whatever they want.
Should curl be considered a bot too? What's the difference?
the service is actually very convenient no matter faang likes it or not.
Part of me thinks that the open web has a paradox of tolerance issue, leading to a race to the bottom/tragedy of the commons. Perhaps it needs basic terms of use. Like if you run this kind of business, you can build it on top of proprietary tech like apps and leave the rest of us alone.
It is also only a matter of time scrapers once again get through walls by twitter, reddit and alike. This is, after all, information everyone produced, without being aware of it was now considered not theirs anymore.
god help us if they ever manage to build anything more than shitty chatbots
CF being internet police is a problem too but someone credible publicly shaming a company for shady scraping is good. Even if it just creates conversation
Somehow this needs to go back to search era where all players at least attempt to behave. This scrapping Ddos stuff and I don’t care if it kills your site (while “borrowing” content) is unethical bullshit
If you don't want to get scrapped, don't put up your stuff online.
No, he (Matthew) opted everyone in by default. If you’re a Cloudflare customer and you don’t care if AI can scrape your site, you should contact them and/or turn this off.
In a world where AI is fast becoming more important than search, companies who want AI to recommend their products need to turn this off before it starts hurting them financially.
Content marketing, gamified SEO, and obtrusive ads significantly hurt the quality of Google search. For all its flaws, LLMs don’t feel this gamified yet. It’s disappointing that this is probably where we’re headed. But I hope OpenAI and Anthropic realize that this drop in search result quality might be partly why Google’s losing traffic.
If you want to gatekeep your content, use authentication.
Robots.txt is not a technical solution, it's a social nicety.
Cloudflare and their ilk represent an abuse of internet protocols and mechanism of centralized control.
On the technical side, we could use CRC mechanisms and differential content loading with offline caching and storage, but this puts control of content in the hands of the user, mitigates the value of surveillance and tracking, and has other side effects unpalatable to those currently exploiting user data.
Adtech companies want their public reach cake and their mass surveillance meals, too, with all sorts of malignant parties and incentives behind perpetuating the worst of all possible worlds.
(IANAL) tortious interference
I was skeptical about their gatekeeping efforts at first, but came away with a better appreciation for the problem and their first pass at a solution.
I don't really know anything about DRM except it is used to take down sites that violate it. Perhaps it is possible for cloudflare (or anyone else) to file a take down notice with Perplexity. That might at least confuse them.
Corporations use this to protect their content. I should be able to protect mine as well. What's good for the goose.
There are ways to build scrapers using browser automation tools [0,1] that makes detection virtually impossible. You can still captcha, but the person building the automation tools can add human-in-the-loop workflows to process these during normal business hours (i.e., when a call center is staffed).
I've seen some raster-level scraping techniques used in game dev testing 15 years ago that would really bother some of these internet police officers.
no, because we'll end up with remote attestation needed to access any site of value
$ curl -sI https://www.perplexity.ai | head -1
HTTP/2 403
gruez•1h ago
Thats... less conclusive than I'd like to see, especially for a content marketing article that's calling out a company in particular. Specifically it's unclear on whether Perplexity was crawling (ie. systematically viewing every page on the site without the direction of a human), or simply retrieving content on behalf of the user. I think most people would draw a distinction between the two, and would at least agree the latter is more acceptable than the former.
thoroughburro•1h ago
No. I should be able to control which automated retrieval tools can scrape my site, regardless of who commands it.
We can play cat and mouse all day, but I control the content and I will always win: I can just take it down when annoyed badly enough. Then nobody gets the content, and we can all thank upstanding companies like Perplexity for that collapse of trust.
gkbrk•1h ago
But they didn't take down the content, you did. When people running websites take down content because people use Firefox with ad-blockers, I don't blame Firefox either, I blame the website.
Bluescreenbuddy•1h ago
hombre_fatal•1h ago
It's also a gift to your competitors.
You're certainly free to do it. It's just a really faint example of you being "in control" much less winning over LLM agents: Ok, so the people who cared about your content can't access it anymore because you "got back" at Perplexity, a company who will never notice.
ipaddr•16m ago
Who cares if perplexity will never notice. Or competitors get an advantage. It is a negative for users using perplexity or visiting directly because the content doesn't exist.
That's the world perplexity and others are creating. They will be able to pull anything from the web but nothing will be left.
IncreasePosts•1h ago
ipaddr•15m ago
Den_VR•1h ago
But really, controlling which automated retrieval tools are allowed has always been more of a code of honor than a technical control. And that trust you mention has always been broken. For as long as I can remember anyway. Remember LexiBot and AltaVista?
fluidcruft•1h ago
JimDabell•1h ago
sbarre•1h ago
Otherwise it's just adding an unwilling website to a crawl index, and showing the result of the first crawl as a byproduct of that action.
fluidcruft•1h ago
JimDabell•1h ago
That is true. But robots.txt is not designed to give them the ability to prevent this.
gruez•1h ago
That's basically how many crowdsourced crawling/archive projects work. For instance, sci-hub and RECAP[1]. Do you think they should be shut down as well? In both cases there's even a stronger justification to shutting them down, because the original content is paywalled and you could plausibly argue there's lost revenue on the line.
[1] https://en.wikipedia.org/wiki/Free_Law_Project#RECAP
fluidcruft•1h ago
a2128•54m ago