I run a website that recently experienced unusually high traffic from what appeared to be legitimate Googlebot. After investigating the access patterns, I was able to identify the source through some creative analysis.
Background
Someone has been scraping my website extensively using what appears to be authentic Googlebot. I traced the activity back to the person responsible, and they revealed they're using a commercial API service that can trigger real Googlebot crawls on-demand.
Technical Details
I tested the service myself to verify their claims, and confirmed it does indeed dispatch legitimate Googlebot to any URL within 1–2 seconds.
Verified Googlebot IPs (via reverse DNS):
- 66.249.76.65 → crawl-66-249-76-65.googlebot.com
- 192.178.4.87 → crawl-192-178-4-87.googlebot.com
- 2001:4860:4801:002d::0006 → crawl-2001-4860-4801-002d...googlebot.com
- Additional IPs from 34.96.x.x range → googleusercontent.com
Request Headers:
- User-Agent: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
- From: googlebot(at)googlebot.com
- Referer: https://www.google.com/
What Makes This Unusual:
- The service returns scraped HTML within 1–2 seconds
- It works for completely fresh URLs that have never been crawled
- All reverse DNS lookups confirm legitimate Google infrastructure
- The requests are triggered on-demand via API call
Verification Offer
I'm happy to validate these claims by having the service trigger a crawl to a unique test URL, so you can verify in your internal logs that it's genuinely Googlebot being dispatched.
Any insights into how this is technically possible?
Thanks!