I'm the technical founder of Zyro. I built this because I kept seeing my "Direct" traffic bucket inflate while my ad spend efficiency dropped. I suspected the traffic wasn't actually "Direct," but standard analytics tools (GA4) were stripping the referrer data from newer sources.
I spent the last year building a custom detection engine to "unmask" this traffic.
The Tech Stack & Implementation:
The Core Problem: Most AI tools (ChatGPT, Perplexity, Claude) and "dark social" apps don't pass standard referrer headers.
The Solution: I built a TrafficSourceDetector that parses over 50 specific tracking parameters (like ttclid, gbraid, and specific AI signatures) that usually get sanitized by default configs.
Data Handling: One major headache was URL truncation with standard string columns. I migrated the schema to use NVARCHAR(MAX) in SQL Server to handle the massive, parameter-heavy URLs generated by modern ad platforms without data loss.
Optimization Logic: Instead of frequentist A/B testing (which wastes traffic on losers), I implemented a Multi-Armed Bandit (Thompson Sampling) algorithm. It updates in real-time to route traffic to the winning variant automatically.
Latency: To ensure the "flicker" doesn't hurt UX, I moved the geolocation logic (MaxMind) to a localized instance (avoiding external API calls) to keep decision latency near 0ms.
Scanning: The visual editor uses HtmlAgilityPack to parse the DOM and identify "testable" elements (headlines, buttons) automatically.
The tool is currently live. I'm looking for feedback on the detection logic—specifically if anyone else is seeing massive "Direct" traffic that turns out to be AI scrapers/users.
Happy to answer questions about the bandit algorithm or the SQL architecture!
edwardglush•1h ago
I'm the technical founder of Zyro. I built this because I kept seeing my "Direct" traffic bucket inflate while my ad spend efficiency dropped. I suspected the traffic wasn't actually "Direct," but standard analytics tools (GA4) were stripping the referrer data from newer sources.
I spent the last year building a custom detection engine to "unmask" this traffic.
The Tech Stack & Implementation:
The Core Problem: Most AI tools (ChatGPT, Perplexity, Claude) and "dark social" apps don't pass standard referrer headers.
The Solution: I built a TrafficSourceDetector that parses over 50 specific tracking parameters (like ttclid, gbraid, and specific AI signatures) that usually get sanitized by default configs.
Data Handling: One major headache was URL truncation with standard string columns. I migrated the schema to use NVARCHAR(MAX) in SQL Server to handle the massive, parameter-heavy URLs generated by modern ad platforms without data loss.
Optimization Logic: Instead of frequentist A/B testing (which wastes traffic on losers), I implemented a Multi-Armed Bandit (Thompson Sampling) algorithm. It updates in real-time to route traffic to the winning variant automatically.
Latency: To ensure the "flicker" doesn't hurt UX, I moved the geolocation logic (MaxMind) to a localized instance (avoiding external API calls) to keep decision latency near 0ms.
Scanning: The visual editor uses HtmlAgilityPack to parse the DOM and identify "testable" elements (headlines, buttons) automatically.
The tool is currently live. I'm looking for feedback on the detection logic—specifically if anyone else is seeing massive "Direct" traffic that turns out to be AI scrapers/users.
Happy to answer questions about the bandit algorithm or the SQL architecture!