I had just released https://versiondb.io a few hours ago. It's something where you're able to get a slice of what's running on the web without breaking the bank. The full version contains over 4M domains and over 3K detected technologies.
Have fun and I hope you guys find it useful.
innagadadavida•2h ago
Is there data on what site search engine they use? This is hard to get as it is sitting deep in the backend but will be super useful information. built with doesn’t have this, but they do have a list mapping search engine (Bloomreach, Coveo, Algolia) to the website probably based on private data dumps. Being able to look this up for a website will be very useful.
_chse_•2h ago
This dataset does include Bloomreach Discovery, Coveo and Algolia. These were detected by looking through HTTP responses for publicly available web pages. For example, Coveo was detected by searching a script tag's src attribute for "static.cloud.coveo.com".
_chse_•3h ago
Have fun and I hope you guys find it useful.