I have been working on this project for a while, and I think this solves a problem that a lot of people here have: not being able to easily search Discord servers.
Currently, I only scrape servers that are marked as "discoverable" on Discord. However, if there's enough interest in the project, I'm open to adding specific servers by request. I'm primarily focused on informational servers rather than casual hangout spaces, such as open source projects, Minecraft mods, and support communities for tools, services, or platforms (for example, hosting providers).
I have placed restrictions on searching directly by user ID to prevent doxing. I also made the opt out process one click, for those who do not want to be archived.
This is my first large scale project, so I'd love to hear your feedback!
hofrogs•2h ago
searchcord•2h ago
Thanks for your feedback.
For software, I use ScyllaDB and Elasticsearch. It's split across 6 physical nodes (8 including the CDN). Data collection is handled using standard user accounts, accessing only public, discoverable servers. I plan to write a blog post about the technical aspect of how this was done soon.
Admins of these servers weren't contacted, as the content indexed is already publicly accessible, comparable to a forum like this or public subreddit. That said, I understand the sensitivity around data visibility, and I've made it very simple for any user to opt out of indexing at any time. Private or invite-only servers are, of course, completely excluded.
hofrogs•2h ago
searchcord•2h ago
hofrogs•1h ago
uniqueuid•1h ago
searchcord•1h ago
Not exactly. Attachments are only fetched from Discord as the user requests them. This means that the vast majority of attachments are never stored on my server. Right now, I only have about 280TB of attachments locally on my own infrastructure. You can see more stats here: https://searchcord.io/about
Thanks for your question!
klntsky•1h ago
searchcord•1h ago
Thanks for your suggestions. However, this does not work for a few reasons:
1. Joining servers is protected by increasingly difficult to solve captchas that have no commercially available solver. This is not a battle I want to fight.
2. There are a LOT of CSAM rings that spam invite links in public servers. This is also not something I want to go anywhere near.
Moreover, after the fallout of spy.pet, I think it is very important that users are able to opt out.