frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: An open source access logs analytics script to block bot attacks

https://github.com/tempesta-tech/webshield
24•krizhanovsky•8h ago
This is a small PoC Python project for web server access logs analyzing to classify and dynamically block bad bots, such as L7 (application-level) DDoS bots, web scrappers and so on.

We'll be happy to gather initial feedback on usability and features, especialy from people having good or bad experience wit bots.

*Requirements*

The analyzer relies on 3 Tempesta FW specific features which you still can get with other HTTP servers or accelerators:

1. JA5 client fingerprinting (https://tempesta-tech.com/knowledge-base/Traffic-Filtering-b...). This is a HTTP and TLS layers fingerprinting, similar to JA4 (https://blog.foxio.io/ja4%2B-network-fingerprinting) and JA3 fingerprints. The last is also available in Envoy (https://www.envoyproxy.io/docs/envoy/latest/api-v3/extension...) or Nginx module (https://github.com/fooinha/nginx-ssl-ja3), so check the documentation for your web server

2. Access logs are directly written to Clickhouse analytics database, which can cunsume large data batches and quickly run analytic queries. For other web proxies beside Tempesta FW, you typically need to build a custom pipeline to load access logs into Clickhouse. Such pipeliens aren't so rare though.

3. Abbility to block web clients by IP or JA5 hashes. IP blocking is probably available in any HTTP proxy.

*How does it work*

This is a daemon, which

1. Learns normal traffic profiles: means and standard deviations for client requests per second, error responses, bytes per second and so on. Also it remembers client IPs and fingerprints.

2. If it sees a spike in z-score (https://en.wikipedia.org/wiki/Standard_score) for traffic characteristics or can be triggered manually. Next, it goes in data model search mode

3. For example, the first model could be top 100 JA5 HTTP hashes, which produce the most error responses per second (typical for password crackers). Or it could be top 1000 IP addresses generating the most requests per second (L7 DDoS). Next, this model is going to be verified

4. The daemon repeats the query, but for some time, long enough history, in the past to see if in the past we saw a hige fraction of clients in both the query results. If yes, then the model is bad and we got to previous step to try another one. If not, then we (likely) has found the representative query.

5. Transfer the IP addresses or JA5 hashes from the query results into the web proxy blocking configuration and reload the proxy configuration (on-the-fly).

Comments

imiric•5h ago
Thanks for sharing!

The heuristics you use are interesting, but this will likely only be a hindrance to lazy bot creators. TLS fingerprints can be spoofed relatively easily, and most bots rotate their IPs and signals to avoid detection. With ML tools becoming more accessible, it's only a matter of time until bots are able to mimic human traffic well enough, both on the protocol and application level. They probably exist already, even if the cost is prohibitively high for most attackers, but that will go down.

Theoretically, deploying ML-based defenses is the only viable path forward, but even that will become infeasible. As the amount of internet traffic generated by bots surpasses the current ~50%, you can't realistically block half the internet.

So, ultimately, I think allow lists are the only option if we want to have a usable internet for humans. We need a secure and user-friendly way to identify trusted clients, which, unfortunately, is ripe to be exploited by companies and governments. All proposed device attestation and identity services I've seen make me uneasy. This needs to be a standard built into the internet, based on modern open cryptography, and not controlled by a single company or government.

I suppose it already exists with TLS client authentication, but that is highly impractical to deploy. Is there an ACME protocol for clients? ... Huh, Let's Encrypt did support issuing client certs, but they dropped it[1].

[1]: https://news.ycombinator.com/item?id=44018400

monster_truck•2h ago
This is not realistically useful

FSF announces Librephone project

https://www.fsf.org/news/librephone-project
419•g-b-r•4h ago•159 comments

Disk Prices

https://diskprices.com/?locale=us
49•bookofjoe•2h ago•15 comments

New England's last coal plant has stopped operating, according to its owners

https://www.nhpr.org/nh-news/2025-10-06/new-englands-last-coal-plant-has-stopped-operating-accord...
64•toomuchtodo•3h ago•33 comments

Beliefs that are true for regular software but false when applied to AI

https://boydkane.com/essays/boss
277•beyarkay•9h ago•216 comments

Why The Pentagon run the best schools and the safest nuclear program

https://www.governance.fyi/p/the-pentagons-best-schools-and-safest
27•guardianbob•2h ago•13 comments

How bad can a $2.97 ADC be?

https://excamera.substack.com/p/how-bad-can-a-297-adc-be
205•jamesbowman•11h ago•113 comments

Can We Know Whether a Profiler Is Accurate?

https://stefan-marr.de/2025/10/can-we-know-whether-a-profiler-is-accurate/
16•todsacerdoti•2h ago•2 comments

Hacking the Humane AI Pin

https://writings.agg.im/posts/hacking_ai_pin/
94•agg23•6d ago•21 comments

Interviewing Intel's Chief Architect of x86 Cores

https://chipsandcheese.com/p/interviewing-intels-chief-architect
24•ryandotsmith•5d ago•0 comments

How AI hears accents: An audible visualization of accent clusters

https://accent-explorer.boldvoice.com/
180•ilyausorov•12h ago•70 comments

Nvidia DGX Spark: great hardware, early days for the ecosystem

https://simonwillison.net/2025/Oct/14/nvidia-dgx-spark/
21•GavinAnderegg•3h ago•3 comments

Unpacking Cloudflare Workers CPU Performance Benchmarks

https://blog.cloudflare.com/unpacking-cloudflare-workers-cpu-performance-benchmarks/
145•makepanic•7h ago•20 comments

Surveillance data challenges what we thought we knew about location tracking

https://www.lighthousereports.com/investigation/surveillance-secrets/
336•_tk_•7h ago•79 comments

How to turn liquid glass into a solid interface

https://tidbits.com/2025/10/09/how-to-turn-liquid-glass-into-a-solid-interface/
93•tambourine_man•8h ago•70 comments

Printing Petscii Faster

https://retrogamecoders.com/printing-petscii-faster/
7•ibobev•4d ago•0 comments

Beating the L1 cache with value speculation (2021)

https://mazzo.li/posts/value-speculation.html
22•shoo•4d ago•7 comments

SmolBSD – build your own minimal BSD system

https://smolbsd.org
148•birdculture•10h ago•11 comments

GrapheneOS is ready to break free from Pixels

https://www.androidauthority.com/graphene-os-major-android-oem-partnership-3606853/
207•MaximilianEmel•5h ago•86 comments

What Americans die from vs. what the news reports on

https://ourworldindata.org/does-the-news-reflect-what-we-die-from
453•alphabetatango•9h ago•251 comments

A 12,000-year-old obelisk with a human face was found in Karahan Tepe

https://www.trthaber.com/foto-galeri/karahantepede-12-bin-yil-oncesine-ait-insan-yuzlu-dikili-tas...
271•fatihpense•1w ago•110 comments

Astronomers 'image' a mysterious dark object in the distant Universe

https://www.mpg.de/25518363/1007-asph-astronomers-image-a-mysterious-dark-object-in-the-distant-u...
205•b2ccb2•13h ago•107 comments

CSS for Styling a Markdown Post

https://webdev.bryanhogan.com/miscellaneous/styling-markdown/
20•bryanhogan•1w ago•5 comments

Ally Petitt: Youngest OSCP at 16yo. Over 11 CVEs by 18

https://ally-petitt.com/en/posts/2024-05-07_how-i-became-a-hacker-before-i-finished-high-school/
34•nullbyte808•4h ago•6 comments

ADS-B Exposed

https://adsb.exposed/
289•keepamovin•17h ago•73 comments

AI and Home-Cooked Software

https://mrkaran.dev/posts/ai-home-cooked-software/
41•todsacerdoti•1w ago•24 comments

Preparing for AI's economic impact: exploring policy responses

https://www.anthropic.com/research/economic-policy-responses
30•grantpitt•9h ago•29 comments

Show HN: Metorial (YC F25) – Vercel for MCP

https://github.com/metorial/metorial
47•tobihrbr•13h ago•18 comments

Zoo of array languages

https://ktye.github.io/
151•mpweiher•17h ago•46 comments

AppLovin nonconsensual installs

https://www.benedelman.org/applovin-nonconsensual-installs/
144•jhap•7h ago•49 comments

Beyond the SQLite single-writer limitation with concurrent writes

https://turso.tech/blog/beyond-the-single-writer-limitation-with-tursos-concurrent-writes
61•syrusakbary•1w ago•55 comments