frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

P2P crypto exchange development company

1•sonniya•5m ago•0 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
1•jesperordrup•10m ago•0 comments

Write for Your Readers Even If They Are Agents

https://commonsware.com/blog/2026/02/06/write-for-your-readers-even-if-they-are-agents.html
1•ingve•11m ago•0 comments

Knowledge-Creating LLMs

https://tecunningham.github.io/posts/2026-01-29-knowledge-creating-llms.html
1•salkahfi•11m ago•0 comments

Maple Mono: Smooth your coding flow

https://font.subf.dev/en/
1•signa11•18m ago•0 comments

Sid Meier's System for Real-Time Music Composition and Synthesis

https://patents.google.com/patent/US5496962A/en
1•GaryBluto•26m ago•1 comments

Show HN: Slop News – HN front page now, but it's all slop

https://dosaygo-studio.github.io/hn-front-page-2035/slop-news
4•keepamovin•27m ago•2 comments

Show HN: Empusa – Visual debugger to catch and resume AI agent retry loops

https://github.com/justin55afdfdsf5ds45f4ds5f45ds4/EmpusaAI
1•justinlord•29m ago•0 comments

Show HN: Bitcoin wallet on NXP SE050 secure element, Tor-only open source

https://github.com/0xdeadbeefnetwork/sigil-web
2•sickthecat•32m ago•1 comments

White House Explores Opening Antitrust Probe on Homebuilders

https://www.bloomberg.com/news/articles/2026-02-06/white-house-explores-opening-antitrust-probe-i...
1•petethomas•32m ago•0 comments

Show HN: MindDraft – AI task app with smart actions and auto expense tracking

https://minddraft.ai
2•imthepk•37m ago•0 comments

How do you estimate AI app development costs accurately?

1•insights123•38m ago•0 comments

Going Through Snowden Documents, Part 5

https://libroot.org/posts/going-through-snowden-documents-part-5/
1•goto1•38m ago•0 comments

Show HN: MCP Server for TradeStation

https://github.com/theelderwand/tradestation-mcp
1•theelderwand•41m ago•0 comments

Canada unveils auto industry plan in latest pivot away from US

https://www.bbc.com/news/articles/cvgd2j80klmo
3•breve•42m ago•1 comments

The essential Reinhold Niebuhr: selected essays and addresses

https://archive.org/details/essentialreinhol0000nieb
1•baxtr•45m ago•0 comments

Rentahuman.ai Turns Humans into On-Demand Labor for AI Agents

https://www.forbes.com/sites/ronschmelzer/2026/02/05/when-ai-agents-start-hiring-humans-rentahuma...
1•tempodox•46m ago•0 comments

StovexGlobal – Compliance Gaps to Note

1•ReviewShield•49m ago•1 comments

Show HN: Afelyon – Turns Jira tickets into production-ready PRs (multi-repo)

https://afelyon.com/
1•AbduNebu•50m ago•0 comments

Trump says America should move on from Epstein – it may not be that easy

https://www.bbc.com/news/articles/cy4gj71z0m0o
6•tempodox•51m ago•3 comments

Tiny Clippy – A native Office Assistant built in Rust and egui

https://github.com/salva-imm/tiny-clippy
1•salvadorda656•55m ago•0 comments

LegalArgumentException: From Courtrooms to Clojure – Sen [video]

https://www.youtube.com/watch?v=cmMQbsOTX-o
1•adityaathalye•58m ago•0 comments

US moves to deport 5-year-old detained in Minnesota

https://www.reuters.com/legal/government/us-moves-deport-5-year-old-detained-minnesota-2026-02-06/
8•petethomas•1h ago•3 comments

If you lose your passport in Austria, head for McDonald's Golden Arches

https://www.cbsnews.com/news/us-embassy-mcdonalds-restaurants-austria-hotline-americans-consular-...
1•thunderbong•1h ago•0 comments

Show HN: Mermaid Formatter – CLI and library to auto-format Mermaid diagrams

https://github.com/chenyanchen/mermaid-formatter
1•astm•1h ago•0 comments

RFCs vs. READMEs: The Evolution of Protocols

https://h3manth.com/scribe/rfcs-vs-readmes/
3•init0•1h ago•1 comments

Kanchipuram Saris and Thinking Machines

https://altermag.com/articles/kanchipuram-saris-and-thinking-machines
1•trojanalert•1h ago•0 comments

Chinese chemical supplier causes global baby formula recall

https://www.reuters.com/business/healthcare-pharmaceuticals/nestle-widens-french-infant-formula-r...
2•fkdk•1h ago•0 comments

I've used AI to write 100% of my code for a year as an engineer

https://old.reddit.com/r/ClaudeCode/comments/1qxvobt/ive_used_ai_to_write_100_of_my_code_for_1_ye...
3•ukuina•1h ago•1 comments

Looking for 4 Autistic Co-Founders for AI Startup (Equity-Based)

1•au-ai-aisl•1h ago•1 comments
Open in hackernews

Building a personal archive of the web, the slow way

https://alexwlchan.net/2025/personal-archive-of-the-web/
8•ingve•8mo ago

Comments

gwern•8mo ago
OP's workflow might be much more efficient with use of https://github.com/gildas-lormeau/SingleFile/

It can handle most of what they describe for things like private/paywalled pages or media enclosures or completely self-contained archives that live locally or easy to use or editing before saving or ensuring lazy-loaded images are there, you can view it immediately to check for breakage, it automatically works with adblock and NoScript and when you delete stuff in the DOM using the picker so they can clean each page very efficiently (create a bunch of rules in your adblock by picking elements like in ublock, so you never have to do those again, then quickly mouse any remainder), and it stores the final DOM so you can interact with stuff to make sure it is visible or archived.

So what I do ( https://gwern.net/archiving#preemptive-local-archiving ) is I have a script which calls SingleFile-CLI in a headless Chrome browser to automatically archive everything, and then opens up the original URL + snapshot in my normal Firefox, and look at the snapshot then original. If the snapshot looks good, I simply close the 2 tabs after a few seconds and I'm done; if the snapshot looks bad, then I look at the original and make edits: use Ublock Origin to define any necessary rules (assuming the page isn't cleaned up by all the rules I previously defined), make any minor tweaks to the DOM, and then SingleFile-browser-extension it manually.

If you use enough adblock rules, then you get a similar effect to the 'templates' described, since it looks like OP is mostly just trying to remove as much as possible. But since you're archiving the final DOM, you can do anything you like. Something I've done a few times is opening up multiple pages and copy-pasting the key DOM node from each of them into the first one, to create a single consolidated master page, in a way which is a lot easier & more reliable than messing around with the serialized HTML in Emacs.

You can also post-process them. (Because we use these local archives for 'previews' on Gwern.net, and a fully static self-contained HTML page can easily be 100MB+ with all its fonts and images and stuff, we take the SingleFile snapshots and for the large ones, we 'split' them back up, so loading the .html file doesn't necessarily load everything else: https://github.com/gwern/gwern.net/blob/master/build/deconst... And then you can save a lot of space by running standard optimization tools on the split-out files, eg OptiPNG on the revealed PNGs will save gigabytes of space because so many people fail to do the standard image optimizations.)

Compared to "it typically takes me a few minutes to save a page", I handle the majority of pages in a few seconds, and even the nastiest page where I have to delete a lot is usually like a minute. And since I do like 10 URLs a day, this is quite manageable at scale. (I'm up to >15k snapshots, although an unknown fraction are from an initial bulk archiving so may not be of high quality.)