So when an LLM vendor that shall remain nameless had a model start misidentifying itself while the website was complaining about load... I decided to get to the bottom of it.
eBPF cuts through TLS obfuscation like a bunker buster bomb through a ventilation shaft or was it, well you know what I mean.
OutOfHere•6h ago
halb•5h ago
This kind of fingerprinting solutions are widely used everywhere, and they don't have the goal of directly detecting or blocking bots, especially harmless scrapers. They just provide an additional datapoint which can be used to track patterns in website traffic, and eventually block fraud or automated attacks - that kind of bots.
OutOfHere•5h ago
halb•4h ago
Unfortunately there are bad people out there, and they know how to write code. Take a look at popular websites like TikTok, amazon, or facebook. They are inundated by fraud requests whose goal is to use their services in a way that is harmful to others, or straight up illegal. From spam to money laundering. On social medial, bots impersonate people in an attempt to influence public discourse and undermine democracies.
Retr0id•4h ago
OutOfHere•22m ago
konsalexee•5h ago
aorth•5h ago
OutOfHere•5h ago
What is the actual problem with serving users? You mentioned incredible load. I would stop using inefficient PHP or JavaScript or Ruby for web servers. I would use Go or Rust or a comparable efficient server with native concurrency. Survival always requires adaptation.
How do you know that the alleged proxies belong to the same scrapers? I would look carefully at the values contained in the IP chain as determined by XFF to know which subnets to rate-limit as per their membership in the XFF.
Another way is to require authentication for expensive endpoints.
immibis•4h ago
OutOfHere•26m ago
jeffbee•4h ago
TCP fingerprinting is so powerful that sending SMTP temporary failure codes to Windows boxes stops most spam. You can't ignore that kind of utility.
OutOfHere•25m ago
You can stop spam, but you will also stop regular uesrs, and that is the problem. Your classifier is not as powerfully accurate as you think.
If you don't want to put something online, then don't put it online!
ghotli•4h ago
I found this article useful and insightful. I don't have a bot problem at present I have an adjacent problem and found this context useful for an ongoing investigation.