frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Overfitted a 900KB Transformer to Compress a 100MB CSV into 7MB

54•spidy__•2d ago•22 comments

Ask HN: Where is the programming profession going?

109•syntaxbush•1d ago•116 comments

Ask HN: Norway bans AI in elementary schools

7•mellosty•5h ago•5 comments

Tell HN: OpenAI has started putting ads on paid programs

109•shantnutiwari•15h ago•54 comments

Ask HN: How much coding should beginners learn in the AI era?

31•JohnDSDev•1d ago•42 comments

Decoupling Compute and Memory for Async GPUs

7•yiyingzhang•11h ago•2 comments

Ask HN: What surprised you about Estonia e-Residency and running an Estonian OÜ?

75•jvilalta•14h ago•61 comments

Trying to recover from thin content penalty from Google

4•anitroves•8h ago•2 comments

My website gets more attacks than human visitors

2•tommy2970•10h ago•1 comments

I feel like VSCode is falling apart

3•othmanosx•10h ago•3 comments

Ask HN: Quickbooks Alternative?

2•bix6•10h ago•0 comments

Google AI overview for "keynesian economics" is written in Korean

4•something765478•10h ago•2 comments

Ask HN: Do you thank your agents when they did a good job?

5•ex-aws-dude•12h ago•9 comments

As; HN: I was curious why MTP affects PP TPS in llama.cpp. My PoC recovers it?

2•i_am_rocoe•14h ago•1 comments

Ask HN: What home printer do you use/recommend?

18•niyazpk•2d ago•21 comments

Got access to Gemini's actual thinking

4•StizzurpXDD•19h ago•0 comments

Ask HN: What are the hardest problems AWS Lambda MicroVMs can solve now?

6•iaziz786•1d ago•1 comments

Ask HN: What is one thing about AI that annoys you the most?

4•akashwadhwani35•9h ago•6 comments

Ask HN: Will programmers write more efficient code during the memory shortage?

153•amichail•6d ago•246 comments

Ask HN: Yahoo deleted all my emails. Now what?

15•neya•1d ago•12 comments

How to find AI-conservative companies to work for?

20•tossitawayplz•2d ago•12 comments

Ask HN: Anthropic banned me from using Claude Code and I don't know what to do

81•ayi•2d ago•93 comments

Ask HN: Is anyone using the A2A protocol?

96•asim•1w ago•45 comments

Ask HN: Am I missing something with AI

15•vasko•2d ago•22 comments

Ask HN: Why don't LLM harnesses enable/expose custom middleware hooks?

8•fur-tea-laser•1d ago•7 comments

Ask HN: I miss old days of blogging without promotions

8•throwaw12•1d ago•12 comments

Ask HN: What tools are you using for AI-assisted code review?

25•agos•1w ago•30 comments

Tell HN: I never bought anything from clicking on a paid ad

23•julienreszka•3d ago•29 comments

Ask HN: How are you finding work/gigs as a SWE?

10•mariopt•1d ago•7 comments

Anyone else feels many LLMs are heavily biased towards consumerism these days?

8•pyeri•1d ago•4 comments
Open in hackernews

My website gets more attacks than human visitors

2•tommy2970•10h ago
I run a small self-hosted website on a Raspberry Pi 4B at home. A few weeks ago I started wondering: who actually visits a website in 2026? Not just humans. Everything. So I built a public observability dashboard on top of GoAccess that separates traffic into four categories: human visitors, search engine crawlers, AI retrieval agents, and automated attacks. The numbers from the last 17 days surprised me:

4,523 human visits 6,409 automated attack attempts Thousands of crawler requests from search engines and AI systems

The attacks aren't sophisticated. They're mostly automated scanners probing for .env files, WordPress admin panels, and cloud credentials — hitting every public IP on the internet regardless of what's actually running there. What I found more interesting was the AI agent behavior. AI retrieval agents (GPTBot, ClaudeBot, PerplexityBot, Amazonbot) behave differently from traditional search crawlers. They hit semantic files aggressively — llms.txt, sitemap.xml, JSON-LD structured data — and seem to index the knowledge graph structure of a site rather than individual pages. Within hours of publishing new content, multiple AI crawlers had already visited, apparently triggered by the sitemap update rather than any external link. A few observations I didn't expect:

Combined machine traffic consistently exceeds human traffic AI agents discovered new content faster than Google did The semantic structure exposed by the site seems almost as important as the content itself Even a Pi on a residential ISP receives constant automated scans (380+ attempts/day average)

I made the dashboard public because I think the machine side of the web is underobserved. The modern web feels less like "users visiting pages" and more like a parallel ecosystem of crawlers, AI agents, and automated systems running continuously alongside human visitors.

Two questions for HN: Are others tracking AI agents separately from traditional search crawlers? Has anyone else noticed AI retrieval systems indexing semantic structure (JSON-LD, llms.txt) faster than they index page content?

Comments

anenefan•6h ago
I'm curious if you're just tracking browser user agent, fingerprinting or some other method? For instance would someone using a tool to spider your site, would it be classed as an attack?