Show HN: Aroma: Every TCP Proxy Is Detectable with RTT Fingerprinting

86•Sakura-sx•1mo ago

TL;DR explanation (go to https://github.com/Sakura-sx/Aroma?tab=readme-ov-file#tldr-e... if you want the formatted version)

This is done by measuring the minimum TCP RTT (client.socket.tcpi_min_rtt) seen and the smoothed TCP RTT (client.socket.tcpi_rtt). I am getting this data by using Fastly Custom VCL, they get this data from the Linux kernel (struct tcp_info -> tcpi_min_rtt and tcpi_rtt). I am using Fastly for the Demo since they have PoPs all around the world and they expose TCP socket data to me.

The score is calculated by doing tcpi_min_rtt/tcpi_rtt. It's simple but it's what worked best for this with the data Fastly gives me. Based on my testing, 1-0.7 is normal, 0.7-0.3 is normal if the connection is somewhat unstable (WiFi, mobile data, satellite...), 0.3-0.1 is low and may be a proxy, anything lower than 0.1 is flagged as TCP proxy by the current code.

Comments

Sakura-sx•1mo ago

Also, something I haven't included on the README is that apart from testing with Tor, WARP and some other proxies. I did some testing with the free one-week trial of Brightdata's residential proxies, and it does detect them too!!!

KomoD•1mo ago

curl -x http://xxxxx:xxxxx@geo.iproyal.com:11202 -L https://aroma.global.ssl.fastly.net/

<html><body><h1>You don't seem to be using a TCP Proxy!</h1><p>(If you are using a VPN or any other kind of proxy that is not a TCP Proxy, this will not detect it)</p></body></html>

Sakura-sx•1mo ago

That's strange, could you try with "https://aroma.global.ssl.fastly.net/score"?

ranger_danger•1mo ago

I got 0.295 with a mptcp'd proxy

bandie91•1mo ago

pardon my ignorance but it's a HTTP proxy not a TCP one. is not it? ... or is it considering that https upstream goes through "CONNECT" request?

JDye•1mo ago

A request to a HTTPS target through a proxy will use a CONNECT request to establish a tunnel to the target.

This tunnel operates at layer 3, where the client sends TCP segments to the proxy, the server unpacks the segments and then repacks them into new segments to send to the end target. These new TCP segments will contain the timestamp of when they were created.

The HTTP request sent through those segments is unmodified, meaning it will contain the original timestamp from the client machine.

The newer timestamp on the TCP segments means there is a mismatch between the TCP RTT and the HTTP RTT.

agentifysh•1mo ago

so will this detect residential proxies? how is that being done, I am getting hammered and its all legitimate normal ISP traffic.

Sakura-sx•1mo ago

It's done by checking the difference between the initial TCP RTT and the subsequent TCP RTTs, both of which can be retrieved from the Linux Kernel easily without the need for PCAPing. There is more info about how it is done on the README

JDye•1mo ago

To answer your first question, in my tests its around 50% of requests making it through.

Sakura-sx•1mo ago

Are you using a proxy? If you aren't that would be concerning, since false positives are way worse than false negatives.

If you are then it means the score is sometimes a bit lower and sometimes a bit higher than 0.1, which is the threshold for getting blocked.

If you want to know the exact score, you can check https://aroma.global.ssl.fastly.net/score

It's set at a low threshold since I want to avoid blocking regular users at all costs, I think the detection can be improved a lot by using more data and not a single division to calculate the score, in this case it's a somewhat simple PoC.

Thanks for taking the time to test it, I really appreciate it!

JDye•1mo ago

I'm testing using our residential proxies.

It's a super cool tool, I've been wondering about an open source tool doing this since reading about the technique in one of Nikolai Tschacher's blog posts years ago (https://incolumitas.com/pages/about/).

There's a few ways to work around this, but I think it's one of the best signals available to detect low-effort/common proxy providers.

Sakura-sx•1mo ago

Oh I haven't seen that before, it's really cool, thank you for showing me that!

I want to clarify that the approaches are a bit different, they use IP intelligence too and this approach doesn't use any kind of websockets, which is a really good idea, and I have to admit I didn't think of that, but sadly it's not really possible to do it with Fastly.

Another big difference is that this could work with any TCP application, not only HTTP, and if you do it with HTTP/S you can know if it's a proxy or not on a request basis and totally passively, without adding any delay or changing the code of the app.

But yeah, it's a really cool demo, thanks again!

Manouchehri•1mo ago

Would you be open to offering MASQUE proxying? I started to as support to GOST, been testing with Bright Data (only for UDP sadly, not TCP), but would love to see others add support so I could test with more than just 1 vendor.

https://github.com/go-gost/x/pull/75

https://github.com/go-gost/x/pull/76

soldthat•1mo ago

Neat demo. The unsettling part is how little signal you actually need: big CDNs and fraud teams already run much richer timing models than a simple min_rtt / rtt ratio. You can’t spoof away the speed of light, only add latency or jitter, and that itself becomes a fingerprint once you have enough traffic and a few global PoPs to compare from. So this doesn’t magically break L3 VPNs, but anyone relying on “just stick a TCP proxy in front and I’m anonymous/in-region” has been living with a pretty outdated threat model.

Sakura-sx•1mo ago

Thank you! There are other ways of detecting L3 VPNs, but I wanted to start with proxies since they do most of the damage.

ericpauley•1mo ago

Every TCP proxy (that doesn't thwart this) is detectable :)

Countermeasure: pick some min-RTT >= the actual client RTT (you can do this as a TCP proxy by measuring client ping). Measure server RTT and artificially delay responses to be >= min-RTT. This will require an added delay during the handshake and ACKs, but no added delay for the response payloads.

Counter-countermeasure: the above may lead to TCP message types that don't make sense given a traditional TCP client state machine (e.g., delayed ACK would bundle ACK and PUSH but the system shows separate/simultaneous ACK and PUSH packets. Counter-counter-countermeasure is left to the reader.

Sakura-sx•1mo ago

I think you could also compare with TLS handshake timings, delay for client hello among other things. And you could also compare it with HTTP RTT, not to mention that you can do TCP fingerprinting and compare it with the TLS and HTTP fingerprint of the browser, you can also measure the IP TTL and ping, among many other things... What I mean is that there are a ton of things that can be done on both sides, but any company with enough people working at this and enough servers will surely make something miles away from my proof of concept, and they also have a lot of traffic to know what's baseline data and what isn't.

It's a complex but fun world we live in hehe

Bender•1mo ago

I like this. I could see this being extra useful for people not using CDN's if they could easily plug it into nginx, haproxy and such. Currently for proxies I look for the proxy headers and also use a list of known proxy IP's but that is obviously nowhere near as complete as what you built. It might also be interesting to test assorted configurations of SSH forwards and MitM TLS caching proxies such as Squid SSL Bump.

Sakura-sx•1mo ago

I guess for this to work best you'd build your own CDN and have as many servers as possible. I have always dreamed of an Open Source CDN managed by a nonprofit and dedicated to offering CDN services for free or for a reasonable cost.

If you did the timings by comparing to other protocols, like TLS or HTTP you could do this with a single server, but that's a bit more complex than doing it on the same protocol since you have to account for more stuff, but it could be done, at the end of the day, my idea with Aroma was mostly to prove that it's possible, thanks for the feedback btw!

vlovich123•1mo ago

This feels like something that’s a neat claim and will work against simple setups, but less accurate for more complicated scenarios (eg Tor). Then you’re really just relying on how accurate your knowledge of the proxies are.

Also, the readme has slightly incorrect logic I think:

> According to Special Relativity, information cannot travel faster than the speed of light. Therefore, if the round trip time (RTT) is 4ms, it's physically impossible for them to be farther than 2 light milliseconds away, which is approximately 600 kilometers.

It calls out the 33% for fiber but ignores that there’s not a straightline path between two points on the network and there could be wireless, cable, and DSL links somewhere on that hop.

Also, the controlled variable here is latency, not distance. Thus you can always increase latency through buffering and therefor you could be made to appear further than you are. And that buffering need not even be intentional - your perceived distance estimate will vary based upon queuing delays in intermediary depending on time of day (itself a fingerprint if you incorporate time-aware measurements, but a source of error if you don’t).

Fingerprinting is hard and I dislike the framing that it’s absolutely impossible to mask or that there’s not false positive and false negative error rates with the fingerprint.

Sakura-sx•1mo ago

About the straightline path I did think of that but apparently I forgot to address it when writing the README :p

The point I was trying to make is that if the RTT is low enough you can know the connection is being made from close, it's an upper bound, and making some assumptions you can get it lower, so it's not a way of knowing the exact distance but rather the max distance the connection can be made from. If someone is in Spain but they can't be more than 400km from Australia, something went terribly wrong somewhere hehe

In hindsight I think the issue with my explanation is that I was trying to explain the differences when fingerprinting two different protocols, but ended up going for a TCP-only approach since Fastly wouldn't expose to me the data I needed for the TLS and HTTP RTT. But in theory fingerprinting with protocol RTT difference where one protocol is proxied and the other is impossible to bypass, but this is only the theory.

I think I will edit the README in the future since I don't like how it turned out too much. Thanks for the feedback!

By the way, it detects Tor, I tested it ;D

AnthonyMouse•1mo ago

> But in theory fingerprinting with protocol RTT difference where one protocol is proxied and the other is impossible to bypass, but this is only the theory.

Alice wants you to think she's in New York when she's really in Taipei, so she gets a VM in New York and runs a browser in it via RDP. How are you detecting this?

Sakura-sx•1mo ago

I am not detecting that, I am just detecting L4 proxies for now sob

kees99•1mo ago

Very clever, I like it.

When deployed on a popular server, one bit of "IP intelligence" this detector itself can gather is keep database of lowest-seen RTT per given source IP, maybe with some filtering - to cut out "faster-than-light" datapoints, gracefully update when actual network topology changes, etc.

That would establish a baseline, and from there, additional end-to-end RTT should become much more visible.

Sakura-sx•1mo ago

First of all, thanks!

I imagine any big CDN implementing something like this could keep a database of all of this, combined with the old kind of IP intelligence and collecting not only RTT on other protocols like TLS, HTTP, IP (aka ping, and traceroutes too), TCP fingerprint, TLS fingerprint, HTTP fingerprint...

And with algorithms that combine and compare all these data points, I think very accurate models of the proxy could be made. And for things like credit card fraud this could be quite useful.

moreati•1mo ago

Why would one want this? Are there particular situation(s) that it's desirable to detect a TCP proxy? Does presence of a TCP proxy indicate some adverserial behaviour? E.g. surveillance, censorship, a particular attack?

userbinator•1mo ago

Surveillance, on the part of those who want to do this fingerprinting.

dlenski•1mo ago

Came here to ask the same thing. Why do I _care_ if connections to my server come from a TCP proxy? Particularly when a VPN is _not_ observable in a similar way?

Is there some class of bad actors who extensively use TCP proxies and not only _don't_ use VPNs, but would incur large costs in switching to them?

JDye•1mo ago

Web scrapers maybe aren't "bad actors", but many sites dont want them. They'll use tons of TCP proxies which route them through a rotating pool of end user devices (mobiles, routers, etc...). Its not really possible to block these IPs as you'd also be blocking legitimate customers so other ways to detect and block are required.

dlenski•1mo ago

Can't/won't these scrapers just switch to using VPNs or sshuttle or basically anything else that doesn't leak timing info about termination of TCP vs HTTP?

JDye•1mo ago

Not really. You can have 100,000 IPs from proxies or use VPNs and have only 5 egress IPs.

Anybody who wants to stop the scraper could get browser fingerprints, cross reference similar ones with those IPs and quite safely ban them as its highly likely theyre not a legitimate customer.

Its a lot harder to do it for the 100k IPs because those IPs will also have legitimate customer traffic on them and its a lot more likely the browser fingerprint could just be legitimate.

The risk of false postives (blocking real people) is usually higher than just allowing the scrapers and the incetives of a lot of sites arent aligned with stopping scrapers anyway. Think eccommerce, do they _really_ care if the product is being sold to scalpers or real customers? If anything, that behaviour can raise perception of their brand, increase demand, increase prices.

This tool should have less false positives than most, so maybe it will see more adoption than others (TCP fingerprinting for example) but I dont think this is going to affect anyone doing scraping seriously/at scale.

dlenski•1mo ago

> Not really. You can have 100,000 IPs from proxies or use VPNs and have only 5 egress IPs.

Why…?

If I can run a proxy exit node on 100k residential IPs, why can't I run a VPN server on 100k residential IPs?

There is no additional technical complexity or resource consumption from the VPN server compared to the proxy server.

benmmurphy•1mo ago

for phones its a bit difficult because i don't think you can egress out ip traffic without root or jailbreak on iphone and iOS. but i guess on desktop this should be possible

JDye•1mo ago

I don't mean that you can't do it, just that there is no company offering it so right now those are the only two options.

It's something we're experimenting with currently. the other commenter is right about apple products, but on android, desktop, etc... it's pretty easy.

viraptor•1mo ago

Just in case someone tries to use it to make some kind of judgement about the traffic - there's a whole world behind legit or enforced proxies. Especially corporate environments will often tunnel all the traffic for compliance and audit reasons.

Sakura-sx•1mo ago

Yes, it's important to keep this in mind, thanks for your comment!

Rasbora•1mo ago

This is the core concept of how proxies are detected via services like https://layer3intel.com/tripwire or https://spur.us/monocle/

The difference in min TCP RTT and min RTT to respond to a websocket payload is a dead giveaway that there's a middlebox terminating TCP somewhere along the path. You can bypass this by sourcing your request within 30ms of wherever TCP is being terminated, anything under that threshold could be caused by regular noise and isn't a reliable fingerprint. Due to how many gateway's there are between you and a residential proxy exit node this makes fingerprinting them extremely easy.

I expect it won't be long until someone deploys the first proxy service that handles the initial CONNECT payload in the kernel before offloading packet forwarding to an eBPF script that will proxy packets between hosts at layer 3, making this fingerprinting technique obsolete. The cat and mouse game continues.

dlenski•1mo ago

> I expect it won't be long until someone deploys the first proxy service that handles the initial CONNECT payload in the kernel before offloading packet forwarding to an eBPF script that will proxy packets between hosts at layer 3, making this fingerprinting technique obsolete.

https://github.com/sshuttle/sshuttle basically works like this. I've used it for many years. I don't think it'll be possible to detect using this technique.

benmmurphy•1mo ago

sshuttle as described sounds like a normal CONNECT proxy which this is able to detect: https://sshuttle.readthedocs.io/en/stable/how-it-works.html

like its similar to connect or socks proxy except it is using SSH as a transport layer instead of TCP as a transport layer and its doing it transparently without having applications to be written to use the proxy. but if you are just converting TCP packets into a datastream and then sending them somewhere else where you convert them back to TCP packets then this is what this TCP RTT strategy is fundamentally meant to detect. i suspect the TCP only RTT thing works because of the delayed ack behaviour of most operating systems and this will still happen with sshuttle unless you are explicitly using quick-ack. also, quick-ack just works around the TCP-RTT issue and not the differences in timing between TCP and TLS or other higher protocols. i think if you are testing for other RTT differences then quick-ack would make them more obvious.

on the server side sshuttle just uses normal tcp sockets and nothing magic (https://github.com/sshuttle/sshuttle/blob/master/sshuttle/ss...)

also, if you have an sshuttle proxy this site cannot detect it may be due to how close the server is to the client. i have a CONNECT based proxy it is able to detect around 5% of the time (maybe only that often due to a bug) but this is because there is probably less than 10ms latency between the proxy and the client and probably around 50ms latency between the proxy and the server for some reason (?).

29athrowaway•1mo ago

If you like this then you will probably like "The Cuckoo's Egg: Tracking a Spy Through the Maze of Computer Espionage", a 1989 book by Clifford Stoll.

Also available as audiobook, and a documentary ("The KGB, The computer and Me"). https://www.youtube.com/watch?v=Xe5AE-qYan8

Sakura-sx•1mo ago

Thank you! Will check it out!

userbinator•1mo ago

The minimal explanation is that TCP is "turned around" at a dumb proxy, but upper-layer protocols may go further before being turned around. Which is trivially avoidable by delaying the TCP response with the same timing as the upper-layer protocol (and doing so to the protocol above that, etc.)

Sakura-sx•1mo ago

The issue is that if HTTP is an extra 50ms than TCP for example, if you increase TCP by 50ms now HTTP is 100ms more. Basically it is always more no matter how much you increase it.

userbinator•1mo ago

Not if you receive the HTTP request from the client first, before any interaction with the end-host.

JDye•1mo ago

If the proxy can "see" the requests, then this isnt an issue because the headers can be trivially be modified.

The problem is that the proxies which are targets of identification - think proxies for large scale web scraping which use CONNECT tunnels - dont get to "see" the request.

jeroenhd•1mo ago

Do raw TCP proxies still get used often? I'd imagine most proxies you'd want to detect are full HTTP proxies and this formula won't detect those.

I suppose it's possible botnets ("residential proxies") may get detected this way if they're using SOCKS to forward requests?

Still, this looks like an interesting signal to add to a system like Anubis to increase the difficulty for suspicious traffic sources.

This does very reliably detect TOR traffic, though you can just download a list of exit nodes if that's what you want.

JDye•1mo ago

The most common method of proxying with residential proxies is still CONNECT tunnels and from my tests it catches a resi-proxy about 50% of the time. More with tuning of the score thresholds.

Sakura-sx•1mo ago

I think for stealth TCP proxies are more common since you can use your own TLS fingerprints and all of that, with something like an HTTP proxy you'd need to set up your requests to match with the TLS fingerprint that the proxy is using, although I guess the proxy could make the TLS look the same? There are other ways of detecting HTTP proxies like for example comparing with the RTT of websockets or something like that, the idea is that there will always be at least one thing with RTT from the proxy and at least the RTT for one thing from the client that must go trough the proxy, you measure the difference between the two and there you have it.

Manouchehri•1mo ago

Would a similar technique work for tunnels through QUIC?

JDye•1mo ago

I mentioned this in a podcast recently; fingerprinting of proxy servers using QUIC is a lot harder as UDP doesnt have enough headers to allow for unique characteristics like a TCP does.

Theres no way to include a timestamp in a UDP datagram so all timestamps received would be from the client machine.

Manouchehri•1mo ago

Interesting!

So far I've only seen Bright Data (among the large players) offer UDP proxying over QUIC/HTTP3, but that's pretty limiting since less than half of sites have HTTP/3 enabled to begin with.

JDye•1mo ago

BrighData offer H3/QUIC but only in beta and you have to contact their sales team as far as I'm aware.

We (PingProxies) might be the only company to offer H3 to the proxy/QUIC to the target using the CONNECT-UDP method publicly. Although, it is in beta/unstable until I merge my changes into Rust's H3 library.

If you wanna play around with it, email me and I'll get you some credit. I think theres potential for stealth since outdated proxy clients/servers mean automated actors never use H3.

The proxy industry is full of another 100 companies saying they offer H3/QUIC, when they mean UDP proxying using SOCKS. I suppose the knowledge gap and what customers care about (protocol to end target) is very different to what I care about (being right/protocol to the proxy server).

Manouchehri•1mo ago

> BrighData offer H3/QUIC but only in beta and you have to contact their sales team as far as I'm aware.

That's what I thought too, but it's working for me. (I've sent a lot of tickets, maybe they've put our account as something special without telling me, but doubt it.)

> If you wanna play around with it, email me and I'll get you some credit.

Done, emailed! :) Thanks!

> The proxy industry is full of another 100 companies saying they offer H3/QUIC, when they mean UDP proxying using SOCKS.

Out of the large players I've tested, none actually seem to even support SOCKS5's UDP ASSOCIATE. (I have not tested PingProxies yet.)

> I suppose the knowledge gap and what customers care about (protocol to end target) is very different to what I care about (being right/protocol to the proxy server).

I think there's a knowledge gap between the people making the sales landing pages, and the folks who actually run/maintain the proxy servers. There's some large vendors that advertise UDP support (for residential and/or mobile proxies) that I have yet to actually see working.

jedisct1•1mo ago

To write Fastly VCL code, I strongly recommend XVCL https://dip-proto.github.io/xvcl/

It makes VCL so much easier and readable.

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: Axiomeer – An open marketplace for AI agents

Show HN: High-performance bidirectional list for React, React Native, and Vue

Show HN: A luma dependent chroma compression algorithm (image compression)

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: Craftplan – Elixir-based micro-ERP for small-scale manufacturers

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

Show HN: Witnessd – Prove human authorship via hardware-bound jitter seals

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Slack CLI for Agents

Show HN: PalettePoint – AI color palette generator from text or images

Show HN: More beautiful and usable Hacker News

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

Show HN: ARM64 Android Dev Kit

Show HN: Stacky – certain block game clone

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: A toy compiler I built in high school (runs in browser)

Show HN: Env-shelf – Open-source desktop app to manage .env files

Show HN: Nginx-defender – realtime abuse blocking for Nginx

Show HN: Daily-updated database of malicious browser extensions

Show HN: Horizons – OSS agent execution engine

Show HN: MCP App to play backgammon with your LLM

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: Axiomeer – An open marketplace for AI agents

Show HN: High-performance bidirectional list for React, React Native, and Vue

Show HN: A luma dependent chroma compression algorithm (image compression)

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: Craftplan – Elixir-based micro-ERP for small-scale manufacturers

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

Show HN: Witnessd – Prove human authorship via hardware-bound jitter seals

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Slack CLI for Agents

Show HN: PalettePoint – AI color palette generator from text or images

Show HN: More beautiful and usable Hacker News

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

Show HN: ARM64 Android Dev Kit

Show HN: Stacky – certain block game clone

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: A toy compiler I built in high school (runs in browser)

Show HN: Env-shelf – Open-source desktop app to manage .env files

Show HN: Nginx-defender – realtime abuse blocking for Nginx

Show HN: Daily-updated database of malicious browser extensions

Show HN: Horizons – OSS agent execution engine

Show HN: MCP App to play backgammon with your LLM

Show HN: Aroma: Every TCP Proxy Is Detectable with RTT Fingerprinting

Comments