frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Self-host Reddit – 2.38B posts, works offline, yours forever

https://github.com/19-84/redd-archiver
96•19-84•5h ago
Reddit's API is effectively dead for archival. Third-party apps are gone. Reddit has threatened to cut off access to the Pushshift dataset multiple times. But 3.28TB of Reddit history exists as a torrent right now, and I built a tool to turn it into something you can browse on your own hardware.

The key point: This doesn't touch Reddit's servers. Ever. Download the Pushshift dataset, run my tool locally, get a fully browsable archive. Works on an air-gapped machine. Works on a Raspberry Pi serving your LAN. Works on a USB drive you hand to someone.

What it does: Takes compressed data dumps from Reddit (.zst), Voat (SQL), and Ruqqus (.7z) and generates static HTML. No JavaScript, no external requests, no tracking. Open index.html and browse. Want search? Run the optional Docker stack with PostgreSQL – still entirely on your machine.

API & AI Integration: Full REST API with 30+ endpoints – posts, comments, users, subreddits, full-text search, aggregations. Also ships with an MCP server (29 tools) so you can query your archive directly from AI tools.

Self-hosting options: - USB drive / local folder (just open the HTML files) - Home server on your LAN - Tor hidden service (2 commands, no port forwarding needed) - VPS with HTTPS - GitHub Pages for small archives

Why this matters: Once you have the data, you own it. No API keys, no rate limits, no ToS changes can take it away.

Scale: Tens of millions of posts per instance. PostgreSQL backend keeps memory constant regardless of dataset size. For the full 2.38B post dataset, run multiple instances by topic.

How I built it: Python, PostgreSQL, Jinja2 templates, Docker. Used Claude Code throughout as an experiment in AI-assisted development. Learned that the workflow is "trust but verify" – it accelerates the boring parts but you still own the architecture.

Live demo: https://online-archives.github.io/redd-archiver-example/

GitHub: https://github.com/19-84/redd-archiver (Public Domain)

Pushshift torrent: https://academictorrents.com/details/1614740ac8c94505e4ecb9d...

Comments

NickNaraghi•1h ago
Data is available via torrent in this section: https://github.com/19-84/redd-archiver?tab=readme-ov-file#-g...
19-84•1h ago
I have also published sub statistics and profiling for each platform. these can be used to help identify which subs to prioritize for archiving.

reddit: https://github.com/19-84/redd-archiver/blob/main/tools/subre...

voat: https://github.com/19-84/redd-archiver/blob/main/tools/subve...

ruqqus: https://github.com/19-84/redd-archiver/blob/main/tools/guild...

elSidCampeador•1h ago
I wonder if this can be hooked up with the now-dead Apollo app in some way, to get back a slice of time that is forever lost now?
19-84•1h ago
the API should allow for a lot of different integrations
Aurornis•1h ago
Cool way to self-host archives.

What I'd really like is a plugin that automatically pulls from archives somewhere and replaces deleted comments and those bot-overwritten comments with the original context.

Reddit is becoming maddening to use because half the old links I click have comments overwritten with garbage out of protest for something. Ironically the original content is available in these archives (which are used for AI training) but now missing for actual users like me just trying to figure out how someone fixed their printer driver 2 years ago.

kylehotchkiss•56m ago
_Hacker News collectively grabs the dataset to train their models on how to become effective reddit trolls_
19-84•52m ago
the API and MCP server is very powerful ;)
Jordan-117•51m ago
>Voat

Gross. Why would anyone want to have an archive of Reddit For Neonazis?

19-84•43m ago
thank you for your comment, I will support any platform that has complete dataset available. I will take submissions for any complete datasets through github issues. https://github.com/19-84/redd-archiver/blob/main/.github/ISS...
diggyhole•10m ago
Wat?
dvngnt_•48m ago
I want to do the same thing for tiktok. I have 5k videos starting from the pandemic downloaded. want to find a way to use AI to tag and categorize the videos to scroll locally.
syngrog66•27m ago
Did you pay all the people who created its content?

Signal leaders warn agentic AI is an insecure, unreliable surveillance risk

https://coywolf.com/news/productivity/signal-president-and-vp-warn-agentic-ai-is-insecure-unrelia...
245•speckx•2h ago•73 comments

The Tulip Creative Computer

https://github.com/shorepine/tulipcc
110•apitman•3h ago•29 comments

AI Generated Music Barred from Bandcamp

https://old.reddit.com/r/BandCamp/comments/1qbw8ba/ai_generated_music_on_bandcamp/
258•cdrnsf•2h ago•178 comments

Confer – End to end encrypted AI chat

https://confer.to/
51•vednig•6h ago•38 comments

Instagram AI Influencers Are Defaming Celebrities with Sex Scandals

https://www.404media.co/instagram-ai-influencers-are-defaming-celebrities-with-sex-scandals/
47•cdrnsf•56m ago•25 comments

Show HN: Ayder – HTTP-native durable event log written in C (curl as client)

https://github.com/A1darbek/ayder
28•Aydarbek•2h ago•7 comments

Scott Adams has died

https://www.youtube.com/watch?v=Rs_JrOIo3SE
435•ekianjo•5h ago•807 comments

Influencers and OnlyFans models are dominating U.S. O-1 visa requests

https://www.theguardian.com/us-news/2026/jan/11/onlyfans-influencers-us-o-1-visa
238•bookofjoe•3h ago•170 comments

How to make a damn website (2024)

https://lmnt.me/blog/how-to-make-a-damn-website.html
32•birdculture•3h ago•12 comments

Inlining – The Ultimate Optimisation

https://xania.org/202512/17-inlining-the-ultimate-optimisation
12•PaulHoule•4d ago•4 comments

Apple Creator Studio

https://www.apple.com/newsroom/2026/01/introducing-apple-creator-studio-an-inspiring-collection-o...
407•lemonlime227•6h ago•342 comments

Text-based web browsers

https://cssence.com/2026/text-based-web-browsers/
263•pabs3•15h ago•97 comments

Legion Health (YC S21) Hiring Cracked Founding Eng for AI-Native Ops

https://jobs.ashbyhq.com/legionhealth/ffdd2b52-eb21-489e-b124-3c0804231424
1•ympatel•3h ago

Everything you never wanted to know about file locking (2010)

https://apenwarr.ca/log/20101213
39•SmartHypercube•5d ago•9 comments

What a year of solar and batteries saved us in 2025

https://scotthelme.co.uk/what-a-year-of-solar-and-batteries-really-saved-us-in-2025/
208•MattSayar•4h ago•257 comments

Show HN: An iOS budget app I've been maintaining since 2011

https://primoco.me/en/
118•Priotecs•9h ago•55 comments

Git Rebase for the Terrified

https://www.brethorsting.com/blog/2026/01/git-rebase-for-the-terrified/
195•aaronbrethorst•6d ago•211 comments

Show HN: Ever wanted to look at yourself in Braille?

https://github.com/NishantJoshi00/dith
15•cat-whisperer•4d ago•2 comments

A university got itself banned from the Linux kernel (2021)

https://www.theverge.com/2021/4/30/22410164/linux-kernel-university-of-minnesota-banned-open-source
28•italophil•1h ago•7 comments

Show HN: Self-host Reddit – 2.38B posts, works offline, yours forever

https://github.com/19-84/redd-archiver
99•19-84•5h ago•13 comments

Going for Gold: The Story of the Golden Lego RCX and NXT

https://bricknerd.com/home/going-for-gold-the-story-of-the-golden-lego-rcx-and-nxt-9-9-21
5•kotaKat•4d ago•0 comments

Show HN: FastScheduler – Decorator-first Python task scheduler, async support

https://github.com/MichielMe/fastscheduler
24•michielme•5h ago•6 comments

Local Journalism Is How Democracy Shows Up Close to Home

https://buckscountybeacon.com/2026/01/opinion-local-journalism-is-how-democracy-shows-up-close-to...
338•mooreds•6h ago•227 comments

Cowork: Claude Code for the rest of your work

https://claude.com/blog/cowork-research-preview
1219•adocomplete•1d ago•519 comments

The Case for Blogging in the Ruins

https://www.joanwestenberg.com/the-case-for-blogging-in-the-ruins/
48•herbertl•2h ago•6 comments

Anthropic invests $1.5M in the Python Software Foundation

https://discuss.python.org/t/anthropic-has-made-a-large-contribution-to-the-python-software-found...
311•ayhanfuat•5h ago•145 comments

Show HN: SnackBase – Open-source, GxP-compliant back end for Python teams

https://snackbase.dev
49•lalitgehani•8h ago•6 comments

Mozilla's open source AI strategy

https://blog.mozilla.org/en/mozilla/mozilla-open-source-ai-strategy/
163•nalinidash•8h ago•136 comments

Robotopia: A 3D, first-person, talking simulator

https://elbowgreasegames.substack.com/p/introducing-robotopia-a-3d-first
98•psawaya•5d ago•46 comments

The Cray-1 Computer System (1977) [pdf]

https://s3data.computerhistory.org/brochures/cray.cray1.1977.102638650.pdf
143•LordGrey•4d ago•89 comments