frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Invisible Hetzner Outages – What's the Status Page Even For?

2•AmazingTurtle•2m ago•0 comments

--- oops my bad this isnt supposed to be posted

https://status.hetzner.com/
1•AmazingTurtle•3m ago•1 comments

Doctrinic: AI Doctor

https://www.doctronic.ai/
1•handfuloflight•5m ago•0 comments

Ready to move beyond the limits of legacy monitoring?

https://www.crestdata.ai/blogs/migrate-from-legacy-platforms-to-datadog-with-crest-data
1•khushbu_m•7m ago•1 comments

Questioning Representational Optimism in Deep Learning

https://github.com/akarshkumar0101/fer
1•mattdesl•8m ago•0 comments

We Need a New Database?

https://benn.substack.com/p/we-need-a-new-database
1•kiyanwang•11m ago•0 comments

Hasbro, the custodians of D&D, have no idea what to do with BG3's success

https://www.pcgamer.com/games/rpg/its-clear-hasbro-the-custodians-of-d-and-d-have-no-idea-what-to-do-with-baldurs-gate-3s-success-but-thats-nothing-new-its-spent-the-past-10-years-fumbling-the-bag/
2•AntiRush•14m ago•0 comments

Asimov saw it 80 years ago [video]

https://www.youtube.com/shorts/sk5q5o2yXzs
1•sschmitt•14m ago•0 comments

Ask HN: Is reasoning just some iterative context refinement?

2•sschmitt•19m ago•0 comments

Russia's Pravda Network: AI-Driven Disinformation on a Global Scale

https://bisi.org.uk/reports/russias-pravda-network-ai-driven-disinformation-on-a-global-scale
1•haizhung•19m ago•0 comments

Leaving Proton Mail (2024)

https://nelson.cloud/leaving-proton-mail/
1•nelsonfigueroa•21m ago•0 comments

HN: A feed that respects your curiosity, not your habits

https://medium.com/@keshiniama/your-curiosity-deserves-a-better-algorithm-7616f2ac1b01
1•Keshini_shopynw•23m ago•0 comments

You could have designed state of the art positional encoding

https://huggingface.co/blog/designing-positional-encoding
1•FL33TW00D•26m ago•0 comments

Allow us to block Copilot-generated issues (and PRs) from our own repositories

https://github.com/orgs/community/discussions/159749
2•pera•28m ago•0 comments

Living the Slop Life

https://www.nytimes.com/2025/05/19/style/ai-slop-slop-bowls-shein-slop-hauls.html
2•thm•28m ago•0 comments

Windows to go agentic with native MCP support

https://developer.microsoft.com/en-us/windows/agentic/
1•handfuloflight•30m ago•0 comments

Evidence-based policy is hard when social science journals publish poor research

https://asteriskmag.com/issues/10/can-we-trust-social-science-yet
1•yorwba•30m ago•0 comments

Run your GitHub Actions locally

https://nektosact.com
1•enescakir•31m ago•0 comments

Why AI advancement doesn't have to come at the expense of marginalized workers

https://restofworld.org/2025/karen-hao-empire-of-ai-book/
1•rtrgrd•36m ago•1 comments

is-password-secure – Check password security using local Ollama API

https://github.com/skorotkiewicz/is-password-secure
1•modinfo•36m ago•1 comments

Show HN: DeepShot – an open-source NBA predictor with ML, EWMA, and live UI

https://github.com/saccofrancesco/deepshot
1•saccofrancesco•39m ago•0 comments

Tell HN: Fastmail prices increasing by 20% for some users

1•mvdtnz•40m ago•0 comments

Try the BirdNET Sound ID App AI-Powered Bird Sound Recognition

https://birdnet.cornell.edu/
1•sschmitt•42m ago•0 comments

Nvidia CEO Envisions AI Infrastructure Industry Worth 'Trillions of Dollars'

https://blogs.nvidia.com/blog/computex-2025-jensen-huang/
1•TechTechTech•42m ago•0 comments

Trump Signs the Take It Down Act into Law

https://www.theverge.com/news/661230/trump-signs-take-it-down-act-ai-deepfakes
4•thunderbong•49m ago•0 comments

More than you ever wanted to know about font loading on the web (2021)

https://www.industrialempathy.com/posts/high-performance-web-font-loading/
2•Tomte•49m ago•0 comments

Psychology needs to get tired of winning (2022)

https://royalsocietypublishing.org/doi/10.1098/rsos.220099
2•Tomte•50m ago•0 comments

Why This Russian Drone Developer Isn't Impressed by U.S. Tech [video]

https://www.youtube.com/watch?v=RmfNUM2CbbM
1•xbmcuser•52m ago•0 comments

We need 'revolutionary' cooling tech

https://www.bbc.com/news/articles/cpdzjev2d9wo
5•southernplaces7•53m ago•1 comments

Nvidia Licenses NVLink Memory Ports to CPU and Accelerator Makers

https://www.nextplatform.com/2025/05/19/nvidia-licenses-nvlink-memory-ports-to-cpu-and-accelerator-makers/
2•tanelpoder•55m ago•0 comments
Open in hackernews

Show HN: A free, privacy preserving, archive of public Discord servers

https://searchcord.io
43•searchcord•4h ago
Hey HN!

I have been working on this project for a while, and I think this solves a problem that a lot of people here have: not being able to easily search Discord servers.

Currently, I only scrape servers that are marked as "discoverable" on Discord. However, if there's enough interest in the project, I'm open to adding specific servers by request. I'm primarily focused on informational servers rather than casual hangout spaces, such as open source projects, Minecraft mods, and support communities for tools, services, or platforms (for example, hosting providers).

I have placed restrictions on searching directly by user ID to prevent doxing. I also made the opt out process one click, for those who do not want to be archived.

This is my first large scale project, so I'd love to hear your feedback!

Comments

hofrogs•2h ago
This is really cool and actually useful for peeking behind those annoying login walls. What software do you use to store/index/search in so much data? How did you get the data in the first place? Discord isn't exactly known for letting its data be available easily. Have the administrators of the guilds asked you for this? Have you contacted them and made them aware after the fact?
searchcord•2h ago
Hey,

Thanks for your feedback.

For software, I use ScyllaDB and Elasticsearch. It's split across 6 physical nodes (8 including the CDN). Data collection is handled using standard user accounts, accessing only public, discoverable servers. I plan to write a blog post about the technical aspect of how this was done soon.

Admins of these servers weren't contacted, as the content indexed is already publicly accessible, comparable to a forum like this or public subreddit. That said, I understand the sensitivity around data visibility, and I've made it very simple for any user to opt out of indexing at any time. Private or invite-only servers are, of course, completely excluded.

hofrogs•2h ago
That's a lot of compute, how much does it cost to keep it running? I don't see how that project would generate any income on its own
searchcord•2h ago
I already own the hardware, so I only pay for colocation and transit. It's probably a lot less than you think. I hope to find some way to monetize it, but it is cheap enough that I can keep it running for quite a long time without any income.
hofrogs•1h ago
Thanks for this. Well good luck with keeping it up, it's a really useful service.
uniqueuid•1h ago
Wow, that must be quite expensive! You said the files alone are a few PB. So at least 2PB / 8 servers ~= 250TB per server, which would probably put each server at > 20k $ (unless you’re putting it together with duct tape and scraps, but even then the disks will cost a ton).
searchcord•1h ago
Hey,

Not exactly. Attachments are only fetched from Discord as the user requests them. This means that the vast majority of attachments are never stored on my server. Right now, I only have about 280TB of attachments locally on my own infrastructure. You can see more stats here: https://searchcord.io/about

Thanks for your question!

klntsky•1h ago
I suggest you to remove the opt-out functionality and let it scrape private servers that it discovers via publicly posted invite links. You don't owe anyone posting on a public forum any privacy. Moreover, the most valuable data to search for is probably somewhat obscured.
searchcord•1h ago
Hey,

Thanks for your suggestions. However, this does not work for a few reasons:

1. Joining servers is protected by increasingly difficult to solve captchas that have no commercially available solver. This is not a battle I want to fight.

2. There are a LOT of CSAM rings that spam invite links in public servers. This is also not something I want to go anywhere near.

Moreover, after the fallout of spy.pet, I think it is very important that users are able to opt out.

legionof7•2h ago
I've been looking for something like this for so long, thanks for making!

There's so much stuff locked in Discord now that forums have fallen in popularity, think this sort of thing really helps unlock that knowledge again.

searchcord•2h ago
Thanks for your feedback! <3
orph•1h ago
Can I download all the messages & attachments?
searchcord•1h ago
Sure, but there's a few petabytes of attachments and over 63 billion messages. Feel free to use the API.
treyd•1h ago
Would you consider making regular dumps of the database available in sharded torrents like Anna's Archive does so that users can back up the data themselves for preservation purposes? This would complicate retroactively removing users' activity, but that data could already be scraped.

And related, I'd like to be able to run this locally for exports of guilds that I'm on myself. Is that even possible with the architect you've built?

searchcord•1h ago
Hey,

This is absolutely something I want to do, but at the guild level. The database itself is over 13TB which is much to large to create regular exports of. I will probably provide a SQLite export of each guild, regenerated each week/month. Anyone is free to download whatever they want in real time from the API.

Thanks for your question!

ivape•1h ago
Finding good Discord servers has been a great thing for me. I was getting super disconnected and isolated, so different Discord servers has made me feel human again.
searchcord•1h ago
I hope Searchcord helps you! <3
pabs3•1h ago
There are some Discord archives on archive.org too btw.

https://archive.org/search?query=subject%3A%22DiscordChatExp... https://archive.org/search?query=subject%3A%22archiveteam_di... https://wiki.archiveteam.org/index.php/Discord

searchcord•58m ago
Hey,

This is interesting, I somehow missed this. Unfortunately, those are not full text searchable. Maybe I will download them and import them into Searchcord, with proper credit of course.

Thanks for this!

johnQdeveloper•57m ago
> This is my first large scale project, so I'd love to hear your feedback!

> I have placed restrictions on searching directly by user ID to prevent doxing. I also made the opt out process one click, for those who do not want to be archived.

1) I'd suggest anonymizing the usernames / author ids to something more privacy friendly such as how some image sites were generating 3-4 random words as a human readable unique id. This removes a lot of the reason people would opt out (i.e. posts being tracked down years later)

2) You not seem to have a clear rate limit documentation. If you are asking people to pay for commercial use, I'd suggest making it clear what the rough original limits are as well as the rough price range of what you'd offer.

3) Tbh, the only real thing I want from this project is basically narrative / roleplay / writing content for LLM reasons as I'm trying to build a rules-oriented system that narrates via LLM. If you don't want people using this data for this purpose, I'd suggest making that clear.

searchcord•41m ago
Hey,

Thanks for your suggestions.

> 1) I'd suggest anonymizing the usernames / author ids to something more privacy friendly such as how some image sites were generating 3-4 random words as a human readable unique id. This removes a lot of the reason people would opt out (i.e. posts being tracked down years later)

In the original iteration of Searchcord, it used to work similarly to that. The username was `sha256(userid+guildid)`, truncated to the first 8 characters. Unfortunately, it was pretty hard to follow chats. I will try your idea and see how it works, though.

> 2) You not seem to have a clear rate limit documentation.

This is a good idea. The rate limit varies by endpoint, and I haven't gotten around to documenting each one.

> If you are asking people to pay for commercial use, I'd suggest making it clear what the rough original limits are as well as the rough price range of what you'd offer.

I have absolutely zero idea what industry would be interested in this, in what form, and if anyone would even pay.

> 3) Tbh, the only real thing I want from this project is basically narrative / roleplay / writing content for LLM reasons as I'm trying to build a rules-oriented system that narrates via LLM. If you don't want people using this data for this purpose, I'd suggest making that clear.

I really don't care what people do with the data, as long as they are not spamming requests or using the data for commercial purposes without permission.

cinntaile•50m ago
All discord servers require an invitation link as far as I know, do you consider a link you find online as a public server?
searchcord•41m ago
Some very large servers are eligible for what Discord calls "discovery". This makes their data visible without joining the server. You can find a list of those on Discord's site here: https://discord.com/servers
nottorp•24m ago
Suggestion: a bot for smaller servers that do want to be archived like a public forum. Their admins could install the bot themselves and perhaps specify what channels they want archived.
searchcord•21m ago
This is something I have already completed but have not finished bug testing. The bot also includes functionality to recover any server in case it was nuked/wiped and Searchcord has a backup of it. It uses webhooks to resend the messages so you have an approximation of what the channels used to be.
pzmarzly•7m ago
Check out Linen https://www.linen.dev/
Stagnant•49m ago
Incredible work! Truly eye-opening to see how some rarer keywords in my native language return pages of relevant results. Meanwhile google gives 0 results or just AI/ad spam.
IceWreck•42m ago
Do you plan to handle servers where you need to do some action (like send a message) to join all channels ?

I was scrolling through the home page and came across afew where the only channels you're allowed to access are the verify-yourself or welcome channels.

searchcord•38m ago
Probably not. Discord will aggressively captcha you and every server has a different implementation of verification. It might be possible with a captcha solver and then some LLM to figure out the next steps.
bstsb•21m ago
nice project. how are you going to handle the issues involved with breaking Discord's TOS?

> "scraping our services without our written consent"

additionally, are these pages indexable? i know of other projects (opt-in) that create pages made from user discussion.

searchcord•16m ago
> how are you going to handle the issues involved with breaking Discord's TOS?

Not sure. I will solve that problem if and when Discord takes issue with Searchcord.

> additionally, are these pages indexable?

Yes, I would actually like for search engines to index it as their search is much more contextually aware than mine.

EZ-Cheeze•14m ago
I'm in more than a hundred Discord servers. I've been wanting to scrape the members of each of them to discover the people with whom I share the most servers but we're not yet friends. Someone with 10+ would highly likely be a new friend since we'd have a lot of shared niche interests
searchcord•13m ago
This is something I have been trying to make as a way to learn about graph theory. If I can find a way to make it work efficiently, I will definitely add this.
3abiton•11m ago
This is an amazing project, I always wonder how much information is lost in those chat apps, not only Discord, but also Telegram. The latter has hude dev community specifically around Android Rom Development, which migrated from forum based XDA to more flexible chat/support platform like Telegram. I wish that also can be searchable without having their client.
searchcord•8m ago
Telegram is already heavily monitored and scraped due to the large volume of illegal or extremely controversial activity that happens there. This is something I will look into though, my XDA threads rarely get any replies anymore. Thanks for the suggestion!
jonasdoesthings•2m ago
Maybe also exclude messages by bots (e.g. "username has joined the server") from the index to decrease the stalking-potential of your site (99.9% of these bot messages have no informational-value for the index anyways). Currently you can still search for an username and get a subset of servers that the username is in (even if not active) by finding these bot messages.