Plus if you run adsense google with ignore crawler rules and visit the page from google ips and from some shady ip. Wonder if it is the same for sites using Analytics.
In the early days of smartphone use, Google and Facebook uploaded contact lists of every single smartphone user to their servers.
And honestly, I don't blame them. If the summary has the info, why risk going to a possibly ad-filled site?
I can usually tell if the information on a website was written by somebody who knows what they're talking about. (And ads are blocked)
The AI summary on the other hand looks exactly the same to me regardless if it's correct. So it's only useful if I can verify its correctness with minimal effort.
- What does "unreachable" mean, exactly? A 404 or some more serious error?
- What is a "Diamond Product Expert" and do they speak for the company?
This implication (stopped crawl means your pages are invisible) directly contradicts Google's own documentation[0] that states:
> If other pages point to your page with descriptive text, Google could still index the URL without visiting the page. If you want to block your page from search results, use another method such as password protection or noindex.
What I get from the article is the big change is Google now treats missing robots.txt as if it disallowed crawling. Meaning you can still get indexed but not crawled (as per above).
My cynical take for this is this is a preparation for a future AI-related lawsuit. Everyone explicitly allowing Google (and/or other crawlers) is a proof they're doing it with website's permission.
Oh, you'd want to appear in Google search results without appearing in Gemini? Tough luck, bro.
[0] https://developers.google.com/search/docs/crawling-indexing/...
I have a feeling there's more to the story than what's in the blog post.
My only thought is that virtually all "serious" sites tend to have robots.txt, and so not having it indicates a high likelihood of spam.
this is the support page https://support.google.com/webmasters/community-video/360202...
this is the creators linkedin https://www.linkedin.com/in/iskgti/
he does not work for google, just a seo somewhere that creates videos and posts his hypothesis in forums
this is his youtube account https://m.youtube.com/@saket_gupta
nice high quality - propably ai created videos - still no relationship to reality
Google is a rent-seeking parasitic middleman leeching off productive businesses, let them hang out with their best friends at the US administration.
- AI was the first step (or actually, among the first five steps or so). CHECK. - Google search has already been ruined. CHECK. - Now robots.txt is used to weed out "old" websites. CHECK.
They do too much evil. But it is also our fault, because we became WAY too dependent on these mega-corporations.
> Google's crawlers treat all 4xx errors, except 429, as if a valid robots.txt file didn't exist. This means that Google assumes that there are no crawl restrictions.
This is a better source than a random SEO dude with a channel full of AI-generated videos.
It's fairly common for there to be a very long and circuitous route between cause and effect in search, so a bug like this can sometimes be difficult to identify until people start making blog posts about it.
> I don't have a robots.txt right now. It hasn't been there in a long time. Google still shows two results when I search for files on my site though:
The source that he links to is another indian spam channel we've seen a thousand times on YouTube
I naively assumed that they would be happy to take in any and all data, but they had a fairly sophisticated algorithm for deciding "we've seen enough, we know what the next page in the sequence is going to look like." They value their bandwidth.
It led to a lot of gaming of how you optimally split content across high-value pages for search terms (the 5 most relevant reviews should go on pages targeting the New York metro, the next 5 most relevant for LA, etc.)
I'm surprised again, honestly. I kind of assumed the AI race meant that Google would go back to hoovering all data at the cost of extra bandwidth, but my assumption clearly doesn't hold. I can't believe I knew all that about Google and still made the same assumption twice.
This Google Support is another indian spammer that generates tens of nonsense videos and uploads them to YouTube: https://www.youtube.com/watch?v=2LJKNiQJ8LA
This guy is not affiliated with Google in any way other than spamming on their help forums like indian people tend to do
His own website has 92 score in SEO on lighthouse despite his claim he's a "SEO expert"
From the article:
> I don't have a robots.txt right now. It hasn't been there in a long time. Google still shows two results when I search for files on my site though:
guess why
I don't know if the claims made here are true but there really isn't any reason not to have a valid robots.txt available. One could argue that if you want Google to respect robots.txt then not having one should result in Googlebot not crawling any further.
All of that is fast getting completely irrelevant, people see ads on their favourite TikReels app, find their holiday presents on Temu and ask their questions from ChatGPT
josefritzishere•1h ago
sixtyj•1h ago
bflesch•56m ago
TurdF3rguson•53m ago