Implementing fast TCP fingerprinting with eBPF

https://halb.it/posts/ebpf-fingerprinting-1/

51•halb•9h ago

Comments

OutOfHere•6h ago

More useless and harmful anti-bot nonsense, probably with many false detections, when a simple and neutral rate-limiting 429 does the job.

halb•5h ago

I guess the blame is on me here for providing only a very brief context on the topic, which makes it sound like this is just anti-scraping solutions.

This kind of fingerprinting solutions are widely used everywhere, and they don't have the goal of directly detecting or blocking bots, especially harmless scrapers. They just provide an additional datapoint which can be used to track patterns in website traffic, and eventually block fraud or automated attacks - that kind of bots.

OutOfHere•5h ago

If it's making a legitimate request, it's not an automated attack. If it's exceeding its usage quota, that's a simple problem that doesn't require eBPF.

halb•4h ago

What kind of websites do you have in mind when I talk about fraud patterns? not everything is a static website, and I absolutely agree with you on that point: If your static website is struggling under the load of a scraper there is something deeply wrong with your architecture. We live in wonderful times, Nginx on my 2015 laptop can gracefully handle 10k Requests per second before I even activate ratelimiting.

Unfortunately there are bad people out there, and they know how to write code. Take a look at popular websites like TikTok, amazon, or facebook. They are inundated by fraud requests whose goal is to use their services in a way that is harmful to others, or straight up illegal. From spam to money laundering. On social medial, bots impersonate people in an attempt to influence public discourse and undermine democracies.

Retr0id•4h ago

This is an overly simplistic view that does not reflect reality in 2025.

OutOfHere•22m ago

The simple reality is that if you don't want to put something online, then don't put it online. If something should be behind locked doors, then put it behind locked doors. Don't do the dance of promising to have something online, then stop legitimate users when they request it. That's basically what a lot of "spam blockers" do -- they block a ton of legitimate use as well.

konsalexee•5h ago

Sure, buts its a nice exploration to layer 4 type of detection

aorth•5h ago

Why is it useless and harmful? Many of us are struggling—without massive budgets or engineering teams—to keep services up due to incredible load from scrapers in recent years. We do use rate limiting, but scrapers circumvent it with residential proxies and brute force. I often see concurrent requests from hundreds or thousands of IPs in one data center. Who do these people think they are?

OutOfHere•5h ago

It is harmful because innocent users routinely get caught in your dragnet. And why even have a public website if the goal is not to serve it?

What is the actual problem with serving users? You mentioned incredible load. I would stop using inefficient PHP or JavaScript or Ruby for web servers. I would use Go or Rust or a comparable efficient server with native concurrency. Survival always requires adaptation.

How do you know that the alleged proxies belong to the same scrapers? I would look carefully at the values contained in the IP chain as determined by XFF to know which subnets to rate-limit as per their membership in the XFF.

Another way is to require authentication for expensive endpoints.

immibis•4h ago

Residential proxy users are paying on the order of $5 per gigabyte, so send them really big files once detected. Or "click here to load the page properly" followed by a trickle of garbage data.

OutOfHere•26m ago

There is no real way to confidently tell if someone using a residential proxy.

jeffbee•4h ago

Guy who has never operated anything, right here ^

TCP fingerprinting is so powerful that sending SMTP temporary failure codes to Windows boxes stops most spam. You can't ignore that kind of utility.

OutOfHere•25m ago

Please. Save your assumptions.

You can stop spam, but you will also stop regular uesrs, and that is the problem. Your classifier is not as powerfully accurate as you think.

If you don't want to put something online, then don't put it online!

ghotli•4h ago

I downvoted you due to the way you're communicating in this thread. Be kind, rewind. Review the guidelines here perhaps since your account is only a little over a year old.

I found this article useful and insightful. I don't have a bot problem at present I have an adjacent problem and found this context useful for an ongoing investigation.

b0a04gl•3h ago

why do fingerprinting always happens right at connection start ,usually gives clean metadata during tcp syn. but what is it for components like static proxies or load balancers or mobile networks ,all of these can shift stack behavior midstream. this can make this activity itself a obsolete

halb•3h ago

This is a good point. I guess that if you have the luxury of controlling the front-end side of the web application you can implement a system that polls the server routinely. Over time this will give you a clearer picture. You can notice that most real-world fingerprint systems run in part on the Javascript side, which enables all sort of tricks.

benreesman•1h ago

I have work reasons for needing to learn a lot about kernel-level networking primitives (it turns out tcpdump and eBPF compatible with almost anything, no "but boss, foobar is only compatible with bizbazz 7 or above!").

So when an LLM vendor that shall remain nameless had a model start misidentifying itself while the website was complaining about load... I decided to get to the bottom of it.

eBPF cuts through TLS obfuscation like a bunker buster bomb through a ventilation shaft or was it, well you know what I mean.

Tools I love: mise(-en-place)

I made my VM think it has a CPU fan

Personal care products disrupt the human oxidation field

Unhooking from Amazon Ebooks

Show HN: Octelium – FOSS Alternative to Teleport, Cloudflare, Tailscale, Ngrok

4-10x faster in-process pub/sub for Go

Bloom Filters by Example

Using the Internet without IPv4 connectivity

Loss of key US satellite data could send hurricane forecasting back 'decades'

Web Numbers

The Medley Interlisp Project: Reviving a Historical Software System [pdf]

Many ransomware strains will abort if they detect a Russian keyboard installed

Most ints are not floats

Tell HN: (dictionary|thesaurus).reference.com is now a spam site

Why Go Rocks for Building a Lua Interpreter

Show HN: A tool to benchmark LLM APIs (OpenAI, Claude, local/self-hosted)

Show HN: Sharpe Ratio Calculation Tool

Brad Woods Digital Garden

The Unsustainability of Moore's Law

Revisiting Knuth's "Premature Optimization" Paper

More on Apple's Trust-Eroding 'F1 the Movie' Wallet Ad

America's Coming Smoke Epidemic

Solving `Passport Application` with Haskell

The Asymmetry of Destruction

Implementing fast TCP fingerprinting with eBPF

Sequence and first differences together list all positive numbers exactly once

Scientists Retrace 30k-Year-Old Sea Voyage, in a Hollowed-Out Log

The Death of the Middle-Class Musician

Schizophrenia is the price we pay for minds poised near the edge of a cliff

Engineered Addictions

Tools I love: mise(-en-place)

I made my VM think it has a CPU fan

Personal care products disrupt the human oxidation field

Unhooking from Amazon Ebooks

Show HN: Octelium – FOSS Alternative to Teleport, Cloudflare, Tailscale, Ngrok

4-10x faster in-process pub/sub for Go

Bloom Filters by Example

Using the Internet without IPv4 connectivity

Loss of key US satellite data could send hurricane forecasting back 'decades'

Web Numbers

The Medley Interlisp Project: Reviving a Historical Software System [pdf]

Many ransomware strains will abort if they detect a Russian keyboard installed

Most ints are not floats

Tell HN: (dictionary|thesaurus).reference.com is now a spam site

Why Go Rocks for Building a Lua Interpreter

Show HN: A tool to benchmark LLM APIs (OpenAI, Claude, local/self-hosted)

Show HN: Sharpe Ratio Calculation Tool

Brad Woods Digital Garden

The Unsustainability of Moore's Law

Revisiting Knuth's "Premature Optimization" Paper

More on Apple's Trust-Eroding 'F1 the Movie' Wallet Ad

America's Coming Smoke Epidemic

Solving `Passport Application` with Haskell

The Asymmetry of Destruction

Implementing fast TCP fingerprinting with eBPF

Sequence and first differences together list all positive numbers exactly once

Scientists Retrace 30k-Year-Old Sea Voyage, in a Hollowed-Out Log

The Death of the Middle-Class Musician

Schizophrenia is the price we pay for minds poised near the edge of a cliff

Engineered Addictions

Implementing fast TCP fingerprinting with eBPF

Comments