I'm probably just being naive though...
You can of course argue a lot of edge cases if you really want. For the most part I want to say "it isn't worth the argument". In some cases I will take your side if I really have to think about it, but in general the system google has been using mostly works and is mostly an acceptable compromise.
Along with all the other AI companies out there, the've committed the biggest theft in human history.
Also most people would agree they are fine with being indexed in general. That is different from email spam where people don't want it.
People are generally fine with indexing operations so long as you don't use too much bandwidth.
Using AI to summarize content is still and open question - I wouldn't be surprised if this develops to some form of "you can index but not summarize", but only time will tell.
Do you have an example of a court saying that violating robots.txt violates an existing law?
In Ziff Davis v. OpenAI [1], the District Court for the Southern District of New York found that violating robots.txt does not violate DMCA section 1201(a) (formally 17 U.S. Code § 1201(a), which prohibits circumvention of technological protection measures of copyrighted content [2]).
It's my understanding that robots.txt started as a socially-enforced rule and that it remains legally voluntary.
[1] https://blog.ericgoldman.org/archives/2025/12/are-robots-txt...
Adversarial Interoperability is Digital Human Right. Either companies can provide it reasonably or the people will assert their rights through other means.
> Stealthy scrapers like SerpApi override those directives and give sites no choice at all. SerpApi uses shady back doors — like cloaking themselves, bombarding websites with massive networks of bots and giving their crawlers fake and constantly changing names — circumventing our security measures to take websites’ content wholesale. [...] SerpApi deceptively takes content that Google licenses from others (like images that appear in Knowledge Panels, real-time data in Search features and much more), and then resells it for a fee. In doing so, it willfully disregards the rights and directives of websites and providers whose content appears in Search.
To me this seems... interesting, for sure. I think that Google already set a bad precedent by pulling content from the web directly into its results, and an even worse one by paying websites with user-generated content for said content (while those sites didn't pay the users that actually made the user-generated content, as an additional bitchslap.)
But it seems like at the very least Google is suggesting that SerpApi is effectively trying to "steal" the work Google did, rather than do the same work themselves. Though I wonder if this is really Google pulling up the ladder behind them a bit, given how privileged of a position they are in with regards to web scraping.
It's a tough case. I think that something does need to ultimately be done about "malicious" web scraping that ignores robots.txt, but traditionally that sort of thing did not violate any laws, and I feel somewhat skeptical that it will be found to violate the law today. I mean, didn't LinkedIn try this same thing?
Like GoogleBot?
And yeah, robots.txt is not enforced by any law.
I think this is just about dragging SerpApi through a lengthy legal procedure and fees.
* that's the sound of a ladder being yanked up
Their entire ai model was scraped.
this has to be satire. Is Google not the #1 entity guilty of exactly this?
They abuse this power to scrape your work, summarize it and cut you out as much as possible. Pure value extraction of others' work without equal return. Now intensified with AI
But yeah, you're right. They're not deceptive
nobody is forcing anyone. This is the same argument that people said about google search. Nobody is forcing anyone to use google search, google chrome, or even allow googlebot for scraping.
Thousands of poeple have switched over to chatgpt, brave/firefox ..
Your argument sounds like "I dont like Apple's practices, and I'm forced to buy iPhones. No buddy, if you dont like Apple, dont buy their products"
No, not really. There are alternatives to Apple. Whereas here Google controls the gate to the majority of internet traffic
For many it's "block Google and your business dies"
If you want people to visit your website, limiting yourself to the "thousands" of people who don't use google isn't really an option.
> Your argument sounds like "I dont like Apple's practices, and I'm forced to buy iPhones. No buddy, if you dont like Apple, dont buy their products"
Well, I don't like Apple's or Google's practices, but I basically [1] have to use either iOS or Android.
[1]: yes there are things like GrapheneOS and librem, but those aren't really practical for most people.
And then pretending that they're fighting for other people's copyright is just the cherry on top of the pile of hypocrisy.
> SerpApi’s answer to SearchGuard is to mask the hundreds of millions of automated queries it is sending to Google each day to make them appear as if they are coming from human users. SerpApi’s founder recently described the process as “creating fake browsers using a multitude of IP addresses that Google sees as normal users.”
Then they bend over backwards and do the "but not like that!" crap with their legal team and swing their wealth and influence around to screw over other companies and people, and a vast majority of it just vanishes, gets memory holed, with NDAs and out of court settlements, so you never get to see the full scope of harm they inflict unless you're watching like a hawk and catch the headlines before they get disappeared.
Google needs to be broken up and we need to legislate the dismantling of the current adtech regime, with a privacy and sovereignty respecting digital bill of rights that puts the interests of individual citizens above that of giant corporate blobs and the mass surveillance data industry.
Reddit Accuses 'Data Scraper' Companies of Stealing Its Information
https://news.ycombinator.com/item?id=45695433
Our Response to Reddit, Inc. vs. SerpApi, LLC: Defending the First Amendment
Data wants to be free. They knew that once.
EDIT: Also to be clear I am not saying they can't win legally. I'm sure they can do legal games and could shop around until they were successful. They are in the wrong conceptually.
The biggest joke was all the “hackers” 25 years ago shouting “Don’t be evil like Oracle, Microsoft, Apple or Adobe and charge for your software, be good like Google and just put like a banner ad or something and give it away for free”
You can search Google _for free_ (with all the caveats of that statement), part of their grievance is that serpapi use the scraped data as a paid for service
Lots of Google bot blocking is also circumvented, which they seem to have made a lot of efforts towards in the past year
- robots.txt directives (fwiw)
- You need JS
- If you have no cookie you'll be given a set of JS fingerprints, apparently one set for mobile and one for desktop. You may have to tweak what fingerprints you give back in order to get results custom to user agent etc.
Google was never that bothered about scraping if it was done at a reasonable volume. With pools of millions of IPs and a handle on how to get around their blocking they're at the mercy of how polite the scraping is. They're maybe also worried about people reselling data en masse to competitors i.e. their usual all your data belongs to us and only us.
I thought the ads counted as payment? That seems to be the logic used to take technical measures against adblockers on YouTube while pushing users towards a paid ad-free subscription, at least.
If viewing ads is payment, then Google isn't a free service. If viewing ads isn't payment, then Google should have no problem with people using adblockers.
Google would like you to click through as it looks better for their stats, but they don't actually care.
Well not through their API which you do need to pay for and is a paid service.
SERP API just assumes everybody wants to be scraped, and doesn't give you a choice.
(whether websites should have such a choice is a different matter entirely).
You know what getting my consent would look like? Google hosting a form where i can tell them PLEASE SCRAPE MY WEBSITE and include it in your search results. That is what consent looks like.
Google has never asked for my consent. Yet they expect others to behave by different rules.
Now where google may have a reasonable case is that google scrapes with the intention of offering the data “for free”. SerpAPI does not.
Just for anybody wondering, they have always had such a form as well. Apart from their general crawling.
I don't think this suit is actually about that, though. I think Google's complaint is that
> SerpApi deceptively takes content that Google licenses from others
In other words, this is just a good old-fashioned licence violation.
Is that true with how they trained Gemini? Doesn't everyone with a foundational model scrape the web relentlessly without regard for robots.txt?
Like if you give a friend a key to your house so they can check on your plants when you're out of town but they throw a rager and trash the place.
That was not a phrase I expected to read on Hacker News! Haven't heard it since I was about 13. I always assumed it was a Scottish phrase.
But yes, I am dating myself with that phrase...
Almost everybody wants to appear in search, so disallowing the entirety of Google is far more costly than E.G. disallowing Openai, who even differentiates between content scraped for training and content accessed to respond to a user request.
The short answer is that scraping isn't a CFAA offence but might be a terms and conditions violation, depending on the specifics of the access.
Testimony https://medium.com/@brianwarner/celebritynetworths-statement...
CNW ended up putting up content for fake celebrity's after declining Google's request for API usage to prove that Google was scraping them.
https://blog.cloudflare.com/perplexity-is-using-stealth-unde...
They also started caring about this, probably because they don't want their competitors to get the same data as they have.
The index would just point a local crawler towards hubs of resources, links, feeds, and specialized search engines. Then fresh information would come from the crawler itself. My thinking is that reputable sites don't appear every day, if you update your local index once every few months it is sufficient.
The index could host 1..10 or even 100M stubs, each one touching on a different topic, and concentrating the best entry points on the web for that topic. A local LLM can RAG-search it, and use an agent to crawl from there on. If you solve search this way, without Google, and you also have local code execution sandbox, and local model, you can cut the cord. Search was the missing ingredient.
You can still call regular search engines for discovery. You can build your personalized cache of search stubs using regular LLMs that have search integration, like ChatGPT and Gemini, you only need to do it once per topic.
Imagine this stack: local LLM, local search stub index, and local code execution sandbox - a sovereign stack. You can get some privacy and independence back.
I imagine you'd get on just fine for short tail queries but the other cases (longer tail, recent queries, things that haven't been crawled) begin to add up.
I certainly did not and find using the content google scraped from my website for money or AI (which they also sell on a token basis) more questionable than some third party offering API access to it.
[1] https://docs.cloud.google.com/generative-ai-app-builder/docs...
Nextgrid•1mo ago
They have a different definition of "licensing" than most people I guess. Aren't site operators complaining about Google using this "licensed" content in AI overviews... not to mention the scraping for AI model training.
The pot is calling the kettle black.
skybrian•1mo ago
immibis•1mo ago
DDoS remains illegal regardless of robots.txt.
skybrian•1mo ago
immibis•1mo ago
skybrian•1mo ago
immibis•1mo ago
Nextgrid•1mo ago
SerpApi doesn't have that privilege.
bitpush•1mo ago
Nextgrid•1mo ago
xnx•1mo ago
skybrian•1mo ago
https://radar.cloudflare.com/ai-insights#ai-user-agents-foun...
throw-12-16•1mo ago