I realize that this is probably the only way it could work but it is not clear to me that tracking by IP address (even over a single session and shredding the data once a day) is any better from a GDPR standpoint.
That's not completely true. Recital 26 of GDPR stipulates that
> “information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.”
Hashing does not meet this threshold. If the same IP address is hashed using the same method, the result will always be the same, meaning it can be matched. Hashing is therefore considered pseudonimization and under GDPR, pseudonymized data is still considered personal data.
Moreover, the act of anonymization itself is a form of processing and therefore falls under the scope of GDPR. So even attempting to anonymize personal data doesn't remove GDPR obligations for the anonimyzation itself.
> If the same IP address is hashed using the same method, the result will always be the same, meaning it can be matched.
The way people get around this is by using an ephemeral salt, that is deleted e.g. daily. After enough time has passed, it'd be impossible to reverse the hash as the salt would be lost.
Edit: Found more discussion here: https://github.com/plausible/analytics/discussions/1963#disc...
> To summarize, I believe the EDPB has made their position very clear on this in their 2023 guidelines: Plausible's fingerprinting is subject to Article 5(3) of the ePD. Plausible has made their position very clear in their blog post, leaning in the other direction. Until this is tried out in court, I don't believe that there will be any definitive answer.
This seems incompatible with ePD.
People seem to occasionally post cool new solutions, though it doesn't seem like Matomo has gotten that much attention, despite being a pretty strong alternative to Google Analytics (I haven't had that many issues while self-hosting it either).
Does geographic grouping data depend on the IP address? If so I suppose it would need to be extracted first before hashing the IP, and I wonder how much that weakens the anonymization.
If a user can say "here's my IP address, what data do you have on me?" and you can answer that question, then that's personal data under GDPR. It's pseudynomized, but not anonymized, and pseudynomous data is personal data.
In practice, small fries are not an enforcement priority. Regulators in most countries are not well-funded so they have to be frugal with their enforcement actions.
The EU is currently reviewing an option to relax GDPR requirements for smaller businesses. Not remove GDPR requirements, just streamline some of the process overhead.
https://www.reddit.com/r/selfhosted/comments/1kgytl4/i_built...
So, let's not bother with it. I can say all IP address are located in earth and someone would be offended because now we are invading their privacy by knowing which planet they are from. GDPR is not clear on IP address or IP address derived metadata. There is no case law for it, nor acceptable methodology and everyone is speculating about what are the consequences of and it is mostly just opinions from IANALs. GDPR is astrology for non-enterprise companies.
There is, see C-582/14 which concludes that IP address, even dynamic, are personal data.
This is sort of like assuming everyone who is taking photos at a tourist attraction is doing so to show off their holiday for social status.
If your site or content is truly valuable, it is a public good to monitor, analyze, and improve upon its reach and usability.
Maybe you're writing for an audience and you want to see what resonates most with them.
Sometimes popularity is a good thing to measure, not for your ego, but by how much you are helping others.
It is sad when people assume metrics are about vanity, rather than about how much we're helping others.
In my experience, when analytics and the related ads tracking tools break, Marketing departments are revealed to be much more important than generally believed in the business.
What I am trying to say is that you can still do analytics, even pretty advanced stuff with some more elaborate scripting, if you want. The only thing you need is the access log.
Something which has been largely forgotten ever since tools like Urchin became a thing :)
Not a single line of tracking or analytics on the front end, we just tracked everything we cared about at the server level.
Urchin was acquired by Google and was ultimately sunset in favor of Google Analytics. It supported local and hybrid analytics models, the later arguably evolved into Google Analytics.
<source: did fancy things with logs over the last 25 years, including running multiple tools on the same site in parallel to do comparisons (Analog, AWStats Urchin, GA, Omniture, homegrown, etc...)>
For example, in the EU, you need user consent to use server logs that include IP addresses for analytics. You also need to provide post-consent opt-outs and privacy statements and audit logs and all off a sudden you're building another analytics tool.
It objectively won't.
Analytics tell you where your website isn't working, so you can fix it. Buttons you thought were obvious that users are blind to. Pages where nobody scrolls because they didn't realize there was more content. Figuring out where users get stuck because they don't understand the navigation you designed. Etc etc etc.
If you have a hobby website, then sure maybe analytics don't matter. But the idea that sites work better without analytics makes as much sense as saying you'll see better when you wear dark sunglasses.
This project here looks interesting, but is quite new. Lets see how it evolves in the future.
Analytics only work if the agent runs JS. CF on the other hand counts file fetches, which can't be circumvented.
There's always a baseline of bot traffic.
its been that way for a few years, real users using mobile app and access social media now
the percentage internet user who "surfing" on the web is dwindling and more likely diminish in near future
How do products like rybbit.io stay competitive without a similar free tier or major differentiation? Is rybbit generating revenue for its hosted plan?
I think the instinct to distrust big companies is at least partly because many of them have already proven not to be good stewards of data which when combined with their scale has more worrisome implications.
With a smaller/newer player, at least there’s some hope that they’re not capable of the same harms at a smaller scale, and in some cases may market themselves specifically as a more private alternative.
Whether or not this turns out to be true in practice and over the long run is another thing.
I'm choking on the irony
It's okay, but I probably wouldn't choose it again. The ease of setting up Dashboards and Panels is great at first, but you pay for it with a low ceiling of what you can do (without building around it) and a "we trust everyone" approach to security.
I've never used google analytics before. What's the marginal value over statsd?
Also the primary repo is not FOSS, and that "100% FOSS" repo is buried in yet another footnote [2].
Plausible follows in PH footsteps but is not fully faithful to open source. If you want to self host, you won’t have same set of features as their SaaS and need to rely on long term releases for their "community edition" [3]
On "Ahrefs", is there even an open source version of their product? I couldn’t easily find it (on mobile). [4]
Maybe I’ll take a look at others you mentioned later but if rybbit can remain faithful to their FOSS roots then I think there’s a real chance of it becoming huge.
For thosw that don’t want to self host (mostly corporate shitholes), rybbit can milk them with their managed SaaS product.
[1] https://github.com/PostHog/posthog?tab=readme-ov-file#self-h...
[2] https://github.com/PostHog/posthog?tab=readme-ov-file#open-s...
[3] https://github.com/plausible/analytics?tab=readme-ov-file#ca...
How would rybbit.io make money if they are only better at self hosting? Wouldn't the users they are targeting only self host anyways?
> "On "Ahrefs", is there even an open source version of their product? I couldn’t easily find it (on mobile)."
Not all of these companies are open source but they are still competitors because they have generous free tiers so the cost of self hosting an alternative wouldn't be justified.
I tried to self host Posthog for my other project as it far exceeded even the generous free tier. I have a Hetzner bare metal server with 64gb of ram https://www.hetzner.com/dedicated-rootserver/ax42/ and it was running all 16 cores at 100% and didn't end up working. So I think Posthog's stack is just way too heavy to self host effectively, and it's just not in the same category as Plausible, Umami, or Rybbit.
I'm trying to build best OSS analytics out there - and even though it's super crowded, most non-trivial websites run one so there is space for everyone to survive in.
Plausible - good for self-hosting, but their SaaS is very expensive and FOSS vs SaaS offering differ.
Ahrefs - they will use your traffic to improve your competitor research, you really should use them cautiously.
Matomo - feature rich but can be overwhelming.
Posthog - its SaaS is US based so dismissed early by EU customers.
Clarity, like GA has serious privacy issues.
Our product, Wide Angle Analytics, has its own gotchas compared to competitors - its opinionated and there are folks who do not agree with our opinions, but the landscape of websites is so vast that you find your client nevertheless.
That said, we are still in business after 4 years, and we saw few competitors disappear or get acquired and extinguished.
So, all the best to the OP. Hope you find your niche :)
Posthog has had an EU server for years. I'm not sure what you mean by this.
I genuinely don't know how they would proceed, but it'd be interesting to watch.
They can physically tap global internet cable just because they can
Member states have spy agencies, but they also signed treaties to join the EU. Having your spy agency violate international treaties isn’t something most governments allow.
If the information is stored in a country, expect that the owner of the information can be compelled to hand it over by a court order (or it can be seized).
I'm not super familiar with all of these products, so some of these ratings will be based on vibes
1-----------------10
OSS <-> Proprietary
Small business <-> Enterprise
Simplicity <-> Complexity
Web analytics <-> Product analytics
Privacy <-> No privacy
# Rybbit (me) - just launched $0
OSS/Proprietary - 2
I use AGPL 3.0 which isn't as permissive as MIT
Small business/Enterprise - 5
I definitely want enterprises to use Rybbit, but it's hard to target them at this stage
Simplicity/Complexity - 6.5
I think Rybbit is going to end up as one of the more feature-rich OS analytics tools, but I hope it stays easy to use (famous last words)
Web analytics/Product analytics - 4
Want to target both eventually, but my product analytics is weaker relatively
Privacy/No privacy - 3
Can be as GDPR compliant as others, but can also be configured to be a bit more invasive
# Posthog - ~15M ARR
OSS/Proprietary - 4
Have a bunch of enterprise licensed parts of their repo and they tell people in their docs to not self-host it because it's too difficult.
Small business/Enterprise - 8
Seems like they hook startups in with generous free tiers and then milk the unicorns that come out
Simplicity/Complexity - 10
The scope of Posthog is awe inspiring. They are literally 10 startups in 1
Web analytics/Product analytics - 8
I believe product analytics was their first feature
Privacy/No privacy - 7
I think they use cookies?
# Google Analytics
OSS/Proprietary - 10
Small business/Enterprise - 9
Free for everyone but it's clear they don't care about regular users that want to track their small site
Simplicity/Complexity - 8
If there was a dimension for usability it would be 11/10 totally unusable
Web analytics/Product analytics - 6
Not too sure about this one
Privacy/No privacy - 9
i mean it's google
# Mixpanel - $200m ARR
I'm the least familiar with this one
OSS/Proprietary - 9
Small business/Enterprise - 8
Simplicity/Complexity - 8
Web analytics/Product analytics - 9
Privacy/No privacy - 7
# Umami - unknown ARR (maybe 500K?)
OSS/Proprietary - 1
MIT license, no enterprise only features from what I see
Small business/Enterprise - 5
Seem to have some big names on their site
Simplicity/Complexity - 4
Web analytics/Product analytics - 5
Privacy/No privacy - 5 They claim GDPR compliance but I've self hosted it and they clearly fingerprint users without any obvious opt out.
# Plausible - ~2m ARR
OSS/Proprietary - 4
AGPL v3 and some a some enterprise features the community version doesn't have. Also they use Elixir so i doubt anyone actually reads it/s
Small business/Enterprise - 6
Have to be selling to enterprises with that ARR
Simplicity/Complexity - 3
Tool is very simple at the surface, but there's a lot of config options under the hood
Web analytics/Product analytics - 3
Mostly just web analytics
Privacy/No privacy - 2
This is a big focus for them
# Simple Analytics ~500k ARR
OSS/Proprietary - 8
Closed source, but they are an open startup that shares their financials
Small business/Enterprise - 3
They show some big names, but the creator is an indie hacker
Simplicity/Complexity - 2
Self explanatory
Web analytics/Product analytics - 2
Privacy/No privacy - 2
Very GDPR compliance focused
If this was a multi-dimensional vector, I'm trying to fill the space between something like Posthog and Plausible, where we are as open source as either of them and fill the missing space between extreme simplicity and extreme complexity.
Is it possible to use it server-side only, with no JavaScript required? I currently use Umami like that - it has an API, so I can send it page view events and custom events from server-side code. That means analytics can't be disabled by uBlock or the like, or by disabling JavaScript.
> more than 90% of companies use PostHog for free.
I started working on this 4 months ago and only publicly launched a few days ago.
As for monetization, I have no idea yet. I'm happy to collect stars for the time being. What do you think I should do?
It's (un)surprisingly common to end up with multiple website analytics products on the same site; marketing wants these two, another department wants another. When I had ghostery show the list of things it was blocking I often saw multiple, overlapping-feature-set analytics integrations being blocked on the same site.
I've also seen those trackers be added by someone who exits the organization a month later there by blessing the trackers with a protection spell making their removal unlikely for fear of breaking some metric pipeline somewhere.
I wonder if Rybbit is betting more on UX simplicity or niche use cases (e.g. very lightweight deployments, self-hosting ease, etc.). Has anyone here actually tried it? Curious how it stacks up in terms of setup time, dashboard clarity, and tracking depth.
Alternatively, add an identify() call and let others roll their own solution for this.
Then I would actually trust your retention numbers.
Great stuff though! Impressive launch.
As mentioned elsewhere in the thread, there is a lot of bot activity there, that using JS might cleanup a bit.
If you are interested, I have a write up of my setup here, with the report generation down at the bottom: https://gamestrut.com/Blog/2022-06-08-static-site-hosting-on...
> Key Features
> - All key web analytics metrics including sessions, unique users, pageviews, bounce rate, session duration
> - No cookies or user tracking - GDPR & CCPA compliant
- Hashing IPs with other data to form a non-reversible UUID
- Not tracking across multiple host domains
- Not setting cookies or storing any information in the browser
Arguing otherwise is like claiming it’s legal to steal from a store as long as you return the goods the next day - it’s legal fantasy.
I don’t think the EU is eager to go after these “ethical” analytics companies or their users, since they have bigger fish to fry. But if you think you’re legally in the clear using these solutions without user consent, you’re fooling yourself.
I'm using Next.js but I'm using all client-side components. The tooling around SPA client side state is just really good so I don't see a huge reason to go full SSR, especially when SEO doesn't matter for the actual app.
https://github.com/rybbit-io/rybbit/blob/master/server/GeoLi...
I'm one of the co-founders of Pirsch, and a bit worried because the space is getting really crowded :D
ray023•17h ago
colesantiago•16h ago
And some features aren't available 1:1 with the CE version of Plausible either.
bill_yang•12h ago
bill_yang•13h ago