frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: An open source access logs analytics script to block bot attacks

https://github.com/tempesta-tech/webshield
15•krizhanovsky•4h ago
This is a small PoC Python project for web server access logs analyzing to classify and dynamically block bad bots, such as L7 (application-level) DDoS bots, web scrappers and so on.

We'll be happy to gather initial feedback on usability and features, especialy from people having good or bad experience wit bots.

*Requirements*

The analyzer relies on 3 Tempesta FW specific features which you still can get with other HTTP servers or accelerators:

1. JA5 client fingerprinting (https://tempesta-tech.com/knowledge-base/Traffic-Filtering-b...). This is a HTTP and TLS layers fingerprinting, similar to JA4 (https://blog.foxio.io/ja4%2B-network-fingerprinting) and JA3 fingerprints. The last is also available in Envoy (https://www.envoyproxy.io/docs/envoy/latest/api-v3/extension...) or Nginx module (https://github.com/fooinha/nginx-ssl-ja3), so check the documentation for your web server

2. Access logs are directly written to Clickhouse analytics database, which can cunsume large data batches and quickly run analytic queries. For other web proxies beside Tempesta FW, you typically need to build a custom pipeline to load access logs into Clickhouse. Such pipeliens aren't so rare though.

3. Abbility to block web clients by IP or JA5 hashes. IP blocking is probably available in any HTTP proxy.

*How does it work*

This is a daemon, which

1. Learns normal traffic profiles: means and standard deviations for client requests per second, error responses, bytes per second and so on. Also it remembers client IPs and fingerprints.

2. If it sees a spike in z-score (https://en.wikipedia.org/wiki/Standard_score) for traffic characteristics or can be triggered manually. Next, it goes in data model search mode

3. For example, the first model could be top 100 JA5 HTTP hashes, which produce the most error responses per second (typical for password crackers). Or it could be top 1000 IP addresses generating the most requests per second (L7 DDoS). Next, this model is going to be verified

4. The daemon repeats the query, but for some time, long enough history, in the past to see if in the past we saw a hige fraction of clients in both the query results. If yes, then the model is bad and we got to previous step to try another one. If not, then we (likely) has found the representative query.

5. Transfer the IP addresses or JA5 hashes from the query results into the web proxy blocking configuration and reload the proxy configuration (on-the-fly).

Comments

imiric•27m ago
Thanks for sharing!

The heuristics you use are interesting, but this will likely only be a hindrance to lazy bot creators. TLS fingerprints can be spoofed relatively easily, and most bots rotate their IPs and signals to avoid detection. With ML tools becoming more accessible, it's only a matter of time until bots are able to mimic human traffic well enough, both on the protocol and application level. They probably exist already, even if the cost is prohibitively high for most attackers, but that will go down.

Theoretically, deploying ML-based defenses is the only viable path forward, but even that will become infeasible. As the amount of internet traffic generated by bots surpasses the current ~50%, you can't realistically block half the internet.

So, ultimately, I think allow lists are the only option if we want to have a usable internet for humans. We need a secure and user-friendly way to identify trusted clients, which, unfortunately, is ripe to be exploited by companies and governments. All proposed device attestation and identity services I've seen make me uneasy. This needs to be a standard built into the internet, based on modern open cryptography, and not controlled by a single company or government.

I suppose it already exists with TLS client authentication, but that is highly impractical to deploy. Is there an ACME protocol for clients? ... Huh, Let's Encrypt did support issuing client certs, but they dropped it[1].

[1]: https://news.ycombinator.com/item?id=44018400

Killing the GIL: How to Use Python 3.14's Free-Threading Upgrade

https://www.neelsomaniblog.com/p/killing-the-gil-how-to-use-python
1•nsomani•11m ago•0 comments

Ask HN: Thoughts on the New Unplugged Up Phone?

1•ibejoeb•12m ago•0 comments

Augment Code: 22.5% of our users are consuming 20x what they're currently paying

https://old.reddit.com/r/AugmentCodeAI/comments/1o60nlz/addressing_community_feedback_on_our_new_...
1•jrflowers•14m ago•0 comments

Nvidia's 'Personal AI Supercomputer'

https://www.theverge.com/news/798775/nvidia-spark-personal-ai-supercomputer
1•kristianpaul•16m ago•0 comments

El Luchador a Page-Aware AI Sidebar for Chrome

https://chromewebstore.google.com/detail/el-luchador-smart-web-ass/nahjdfphfjnooodfboepbnihgjamehhi
1•sebastianrw•17m ago•1 comments

Patch Tuesday, October 2025 'End of 10' Edition

https://krebsonsecurity.com/2025/10/patch-tuesday-october-2025-end-of-10-edition/
1•todsacerdoti•21m ago•0 comments

JavaScript Library Runs Machine Learning Models in Browser

https://thenewstack.io/javascript-library-runs-machine-learning-models-in-browser/
1•afrinxnahar•22m ago•0 comments

How the Iframe Tag Changed the World

https://blog.hmpl-lang.dev/2025/10/14/how-the-iframe-tag-changed-the-world/
2•aanthonymax•24m ago•1 comments

Soviet-Era Computer Is Both a Mystery and a Disaster

https://hackaday.com/2023/05/07/soviet-era-computer-is-both-a-mystery-and-a-disaster/
1•stmw•27m ago•0 comments

Beads: Coding Agent Memory Upgrade

https://github.com/steveyegge/beads
1•jemiluv8•28m ago•0 comments

Wes Anderson shot a movie in San Francisco [video]

https://www.youtube.com/watch?v=m2P9PZwi8W4
1•nonconstant•28m ago•1 comments

Common yeast can survive Martian conditions

https://phys.org/news/2025-10-common-yeast-survive-martian-conditions.html
1•geox•28m ago•0 comments

RenameForce

https://renameforce.com/
1•codeulike•32m ago•0 comments

ReCAPTCHA migration to Google Cloud by the end of 2025: what do you need to do

https://privatecaptcha.com/blog/recaptcha-migration-to-google-cloud-2025/
1•ribtoks•44m ago•0 comments

'Under tremendous pressure': Newsom vetoes long-awaited AI chatbot bill

https://www.sfgate.com/politics/article/newsom-vetoes-ai-chatbot-bill-21099045.php
5•voxadam•45m ago•2 comments

Convergent CTOS Source Files

https://bitsavers.org/bits/Convergent/ngen/CTOS_source/
1•CTOSian•48m ago•0 comments

AI and the Digital Content Provider's Dilemma

https://katedowninglaw.com/2025/10/14/ai-and-the-digital-content-providers-dilemma/
2•lindenksv1•48m ago•0 comments

GrapheneOS is finally ready to break free from Pixels and it may never look back

https://www.androidauthority.com/graphene-os-major-android-oem-partnership-3606853/
35•MaximilianEmel•49m ago•14 comments

A Review of Bio-Inspired Perching Mechanisms for Flapping-Wing Robots

https://www.mdpi.com/2313-7673/10/10/666
1•PaulHoule•50m ago•0 comments

Google's Pixel 10 Pro Fold explodes during durability testing

https://www.notebookcheck.net/Pixel-10-Pro-Fold-explodes-during-durability-test-and-isn-t-dustpro...
3•didntknowyou•53m ago•1 comments

Information theory for complex systems scientists: What, why, and how

https://www.sciencedirect.com/science/article/pii/S037015732500256X
1•Anon84•54m ago•0 comments

SMuFL

https://www.smufl.org/about/
1•brudgers•56m ago•0 comments

Day 7, Naming Workshop – The Rise of the "Death Star"

https://www.supremefounder.com/naming-workshop.html
1•fmfamaral•59m ago•0 comments

How to Have Productive Conversations About AI

https://zed.dev/blog/reconsidering-ai-steve-klabnik
1•nadis•1h ago•1 comments

Astrodither – Audio reactive WebGL/WebGPU experiment

https://astrodither.robertborghesi.is/
1•dghez•1h ago•0 comments

Show HN: An MCP Server for Testing MCP Servers Using Claude Code

https://github.com/rdwj/mcp-test-mcp
1•rdwj•1h ago•0 comments

SQLPage 0.38: transform SQL queries into web UIs for any DB

https://github.com/sqlpage/SQLPage/releases/tag/v0.38.0
1•lovasoa•1h ago•0 comments

To Panic or Not to Panic

https://www.ncameron.org/blog/to-panic-or-not-to-panic/
2•yurivish•1h ago•0 comments

Standard Model and General Relativity Derived from Mathematical Self-Consistency

https://www.academia.edu/144466150/The_Self_Consistent_Coherence_Maximizing_Universe_Complete_Der...
3•kristintynski•1h ago•2 comments

Boosting Wan2.2 I2V Inference on 8xH100s, 56% Faster with Sequence Parallelism

https://www.morphic.com/blog/boosting-wan2-2-i2v-56-faster/
4•palakzat•1h ago•1 comments