frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Scraping via Googlebot – How is it possible?

3•devx_•2mo ago
Hi,

I run a website that recently experienced unusually high traffic from what appeared to be legitimate Googlebot. After investigating the access patterns, I was able to identify the source through some creative analysis.

Background

Someone has been scraping my website extensively using what appears to be authentic Googlebot. I traced the activity back to the person responsible, and they revealed they're using a commercial API service that can trigger real Googlebot crawls on-demand.

Technical Details

I tested the service myself to verify their claims, and confirmed it does indeed dispatch legitimate Googlebot to any URL within 1–2 seconds.

Verified Googlebot IPs (via reverse DNS):

- 66.249.76.65 → crawl-66-249-76-65.googlebot.com

- 192.178.4.87 → crawl-192-178-4-87.googlebot.com

- 2001:4860:4801:002d::0006 → crawl-2001-4860-4801-002d...googlebot.com

- Additional IPs from 34.96.x.x range → googleusercontent.com

Request Headers:

- User-Agent: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

- From: googlebot(at)googlebot.com

- Referer: https://www.google.com/

What Makes This Unusual:

- The service returns scraped HTML within 1–2 seconds

- It works for completely fresh URLs that have never been crawled

- All reverse DNS lookups confirm legitimate Google infrastructure

- The requests are triggered on-demand via API call

Verification Offer

I'm happy to validate these claims by having the service trigger a crawl to a unique test URL, so you can verify in your internal logs that it's genuinely Googlebot being dispatched.

Any insights into how this is technically possible?

Thanks!

Comments

DaveZale•2mo ago
there are blockers for webcrawlers. A few dozen were supplied by my neocities.org account, but I had to uncomment them
devx_•2mo ago
Not sure how this is relevant.
cmckn•2mo ago
The search console for domains allows you to put in a URL and test-scrape it to see how things look to the bot. Could be some reverse-engineering/abuse of that API.
devx_•2mo ago
Correct me if I'm wrong, but I believe you referring to the Rich Results Test. Fetching through that embeds `Google-InspectionTool` in the user agent, which isn't the case here.
blurrylogic•2mo ago
they definitely rev engineered some internal gcp service that can send GET requests from and see the response( the surface area on gcp is massive) ive been trying to do this but no ball could you pleasee give me the link to their service ( i won't spread it) you can reach me at jainamhs05@gmail.com
semking•2mo ago
You can email me at: rxnx8obtw@mozmail.com

Show HN: A unique twist on Tetris and block puzzle

https://playdropstack.com/
1•lastodyssey•3m ago•0 comments

The logs I never read

https://pydantic.dev/articles/the-logs-i-never-read
1•nojito•4m ago•0 comments

How to use AI with expressive writing without generating AI slop

https://idratherbewriting.com/blog/bakhtin-collapse-ai-expressive-writing
1•cnunciato•5m ago•0 comments

Show HN: LinkScope – Real-Time UART Analyzer Using ESP32-S3 and PC GUI

https://github.com/choihimchan/linkscope-bpu-uart-analyzer
1•octablock•5m ago•0 comments

Cppsp v1.4.5–custom pattern-driven, nested, namespace-scoped templates

https://github.com/user19870/cppsp
1•user19870•6m ago•1 comments

The next frontier in weight-loss drugs: one-time gene therapy

https://www.washingtonpost.com/health/2026/01/24/fractyl-glp1-gene-therapy/
1•bookofjoe•9m ago•1 comments

At Age 25, Wikipedia Refuses to Evolve

https://spectrum.ieee.org/wikipedia-at-25
1•asdefghyk•12m ago•3 comments

Show HN: ReviewReact – AI review responses inside Google Maps ($19/mo)

https://reviewreact.com
2•sara_builds•13m ago•1 comments

Why AlphaTensor Failed at 3x3 Matrix Multiplication: The Anchor Barrier

https://zenodo.org/records/18514533
1•DarenWatson•14m ago•0 comments

Ask HN: How much of your token use is fixing the bugs Claude Code causes?

1•laurex•17m ago•0 comments

Show HN: Agents – Sync MCP Configs Across Claude, Cursor, Codex Automatically

https://github.com/amtiYo/agents
1•amtiyo•18m ago•0 comments

Hello

1•otrebladih•19m ago•0 comments

FSD helped save my father's life during a heart attack

https://twitter.com/JJackBrandt/status/2019852423980875794
2•blacktulip•22m ago•0 comments

Show HN: Writtte – Draft and publish articles without reformatting, anywhere

https://writtte.xyz
1•lasgawe•24m ago•0 comments

Portuguese icon (FROM A CAN) makes a simple meal (Canned Fish Files) [video]

https://www.youtube.com/watch?v=e9FUdOfp8ME
1•zeristor•26m ago•0 comments

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

https://www.hpcwire.com/off-the-wire/brookhaven-labs-rhic-concludes-25-year-run-with-final-collis...
2•gnufx•28m ago•0 comments

Transcribe your aunts post cards with Gemini 3 Pro

https://leserli.ch/ocr/
1•nielstron•32m ago•0 comments

.72% Variance Lance

1•mav5431•33m ago•0 comments

ReKindle – web-based operating system designed specifically for E-ink devices

https://rekindle.ink
1•JSLegendDev•34m ago•0 comments

Encrypt It

https://encryptitalready.org/
1•u1hcw9nx•34m ago•1 comments

NextMatch – 5-minute video speed dating to reduce ghosting

https://nextmatchdating.netlify.app/
1•Halinani8•35m ago•1 comments

Personalizing esketamine treatment in TRD and TRBD

https://www.frontiersin.org/articles/10.3389/fpsyt.2025.1736114
1•PaulHoule•37m ago•0 comments

SpaceKit.xyz – a browser‑native VM for decentralized compute

https://spacekit.xyz
1•astorrivera•37m ago•0 comments

NotebookLM: The AI that only learns from you

https://byandrev.dev/en/blog/what-is-notebooklm
2•byandrev•38m ago•2 comments

Show HN: An open-source starter kit for developing with Postgres and ClickHouse

https://github.com/ClickHouse/postgres-clickhouse-stack
1•saisrirampur•38m ago•0 comments

Game Boy Advance d-pad capacitor measurements

https://gekkio.fi/blog/2026/game-boy-advance-d-pad-capacitor-measurements/
1•todsacerdoti•39m ago•0 comments

South Korean crypto firm accidentally sends $44B in bitcoins to users

https://www.reuters.com/world/asia-pacific/crypto-firm-accidentally-sends-44-billion-bitcoins-use...
2•layer8•39m ago•0 comments

Apache Poison Fountain

https://gist.github.com/jwakely/a511a5cab5eb36d088ecd1659fcee1d5
1•atomic128•41m ago•2 comments

Web.whatsapp.com appears to be having issues syncing and sending messages

http://web.whatsapp.com
1•sabujp•42m ago•2 comments

Google in Your Terminal

https://gogcli.sh/
1•johlo•43m ago•0 comments