Most open-source L7 DDoS mitigation and bot-protection approaches rely on challenges (e.g., CAPTCHA or JavaScript proof-of-work) or static rules based on the User-Agent, Referer, or client geolocation. These techniques are increasingly ineffective, as they are easily bypassed by modern open-source impersonation libraries and paid cloud proxy networks.
We explore a different approach: classifying HTTP client requests in near real time using ClickHouse as the primary analytics backend.
We collect access logs directly from Tempesta FW (https://github.com/tempesta-tech/tempesta), a high-performance open-source hybrid of an HTTP reverse proxy and a firewall. Tempesta FW implements zero-copy per-CPU log shipping into ClickHouse, so the dataset growth rate is limited only by ClickHouse bulk ingestion performance - which is very high.
* periodically executes analytic queries to detect spikes in traffic (requests or bytes per second), response delays, surges in HTTP error codes, and other anomalies;
* upon detecting a spike, classifies the clients and validates the current model;
* if the model is validated, automatically blocks malicious clients by IP, TLS fingerprints, or HTTP fingerprints.
To simplify and accelerate classification — whether automatic or manual — we introduced a new TLS fingerprinting method.
WebShield is a small and simple daemon, yet it is effective against multi-thousand-IP botnets.
krizhanovsky•50m ago
We explore a different approach: classifying HTTP client requests in near real time using ClickHouse as the primary analytics backend.
We collect access logs directly from Tempesta FW (https://github.com/tempesta-tech/tempesta), a high-performance open-source hybrid of an HTTP reverse proxy and a firewall. Tempesta FW implements zero-copy per-CPU log shipping into ClickHouse, so the dataset growth rate is limited only by ClickHouse bulk ingestion performance - which is very high.
WebShield (https://github.com/tempesta-tech/webshield/), a small open-source Python daemon:
* periodically executes analytic queries to detect spikes in traffic (requests or bytes per second), response delays, surges in HTTP error codes, and other anomalies;
* upon detecting a spike, classifies the clients and validates the current model;
* if the model is validated, automatically blocks malicious clients by IP, TLS fingerprints, or HTTP fingerprints.
To simplify and accelerate classification — whether automatic or manual — we introduced a new TLS fingerprinting method.
WebShield is a small and simple daemon, yet it is effective against multi-thousand-IP botnets.