frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

ShowHN: Make OpenClaw Respond in Scarlett Johansson’s AI Voice from the Film Her

https://twitter.com/sathish316/status/2020116849065971815
1•sathish316•2m ago•0 comments

CReact Version 0.3.0 Released

https://github.com/creact-labs/creact
1•_dcoutinho96•3m ago•0 comments

Show HN: CReact – AI Powered AWS Website Generator

https://github.com/creact-labs/ai-powered-aws-website-generator
1•_dcoutinho96•4m ago•0 comments

The rocky 1960s origins of online dating (2025)

https://www.bbc.com/culture/article/20250206-the-rocky-1960s-origins-of-online-dating
1•1659447091•9m ago•0 comments

Show HN: Agent-fetch – Sandboxed HTTP client with SSRF protection for AI agents

https://github.com/Parassharmaa/agent-fetch
1•paraaz•11m ago•0 comments

Why there is no official statement from Substack about the data leak

https://techcrunch.com/2026/02/05/substack-confirms-data-breach-affecting-email-addresses-and-pho...
5•witnessme•15m ago•1 comments

Effects of Zepbound on Stool Quality

https://twitter.com/ScottHickle/status/2020150085296775300
2•aloukissas•18m ago•1 comments

Show HN: Seedance 2.0 – The Most Powerful AI Video Generator

https://seedance.ai/
1•bigbromaker•21m ago•0 comments

Ask HN: Do we need "metadata in source code" syntax that LLMs will never delete?

1•andrewstuart•27m ago•1 comments

Pentagon cutting ties w/ "woke" Harvard, ending military training & fellowships

https://www.cbsnews.com/news/pentagon-says-its-cutting-ties-with-woke-harvard-discontinuing-milit...
6•alephnerd•30m ago•2 comments

Can Quantum-Mechanical Description of Physical Reality Be Considered Complete? [pdf]

https://cds.cern.ch/record/405662/files/PhysRev.47.777.pdf
1•northlondoner•30m ago•1 comments

Kessler Syndrome Has Started [video]

https://www.tiktok.com/@cjtrowbridge/video/7602634355160206623
1•pbradv•33m ago•0 comments

Complex Heterodynes Explained

https://tomverbeure.github.io/2026/02/07/Complex-Heterodyne.html
3•hasheddan•33m ago•0 comments

EVs Are a Failed Experiment

https://spectator.org/evs-are-a-failed-experiment/
3•ArtemZ•45m ago•5 comments

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

https://www.databricks.com/blog/memalign-building-better-llm-judges-human-feedback-scalable-memory
1•superchink•46m ago•0 comments

CCC (Claude's C Compiler) on Compiler Explorer

https://godbolt.org/z/asjc13sa6
2•LiamPowell•47m ago•0 comments

Homeland Security Spying on Reddit Users

https://www.kenklippenstein.com/p/homeland-security-spies-on-reddit
5•duxup•50m ago•0 comments

Actors with Tokio (2021)

https://ryhl.io/blog/actors-with-tokio/
1•vinhnx•51m ago•0 comments

Can graph neural networks for biology realistically run on edge devices?

https://doi.org/10.21203/rs.3.rs-8645211/v1
1•swapinvidya•1h ago•1 comments

Deeper into the shareing of one air conditioner for 2 rooms

1•ozzysnaps•1h ago•0 comments

Weatherman introduces fruit-based authentication system to combat deep fakes

https://www.youtube.com/watch?v=5HVbZwJ9gPE
3•savrajsingh•1h ago•0 comments

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

http://www.effacermonexistence.com/rcc-hn-1-1
1•formerOpenAI•1h ago•2 comments

A Curated List of ML System Design Case Studies

https://github.com/Engineer1999/A-Curated-List-of-ML-System-Design-Case-Studies
3•tejonutella•1h ago•0 comments

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

https://ponyalpha.pro
1•qzcanoe•1h ago•1 comments

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

https://github.com/Goofygiraffe06/tunbot
2•g1raffe•1h ago•0 comments

Open Problems in Mechanistic Interpretability

https://arxiv.org/abs/2501.16496
2•vinhnx•1h ago•0 comments

Bye Bye Humanity: The Potential AMOC Collapse

https://thatjoescott.com/2026/02/03/bye-bye-humanity-the-potential-amoc-collapse/
3•rolph•1h ago•0 comments

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

https://github.com/virattt/dexter
1•Lwrless•1h ago•0 comments

Digital Iris [video]

https://www.youtube.com/watch?v=Kg_2MAgS_pE
1•vermilingua•1h ago•0 comments

Essential CDN: The CDN that lets you do more than JavaScript

https://essentialcdn.fluidity.workers.dev/
1•telui•1h ago•1 comments
Open in hackernews

Building a personal archive of the web, the slow way

https://alexwlchan.net/2025/personal-archive-of-the-web/
8•ingve•8mo ago

Comments

gwern•8mo ago
OP's workflow might be much more efficient with use of https://github.com/gildas-lormeau/SingleFile/

It can handle most of what they describe for things like private/paywalled pages or media enclosures or completely self-contained archives that live locally or easy to use or editing before saving or ensuring lazy-loaded images are there, you can view it immediately to check for breakage, it automatically works with adblock and NoScript and when you delete stuff in the DOM using the picker so they can clean each page very efficiently (create a bunch of rules in your adblock by picking elements like in ublock, so you never have to do those again, then quickly mouse any remainder), and it stores the final DOM so you can interact with stuff to make sure it is visible or archived.

So what I do ( https://gwern.net/archiving#preemptive-local-archiving ) is I have a script which calls SingleFile-CLI in a headless Chrome browser to automatically archive everything, and then opens up the original URL + snapshot in my normal Firefox, and look at the snapshot then original. If the snapshot looks good, I simply close the 2 tabs after a few seconds and I'm done; if the snapshot looks bad, then I look at the original and make edits: use Ublock Origin to define any necessary rules (assuming the page isn't cleaned up by all the rules I previously defined), make any minor tweaks to the DOM, and then SingleFile-browser-extension it manually.

If you use enough adblock rules, then you get a similar effect to the 'templates' described, since it looks like OP is mostly just trying to remove as much as possible. But since you're archiving the final DOM, you can do anything you like. Something I've done a few times is opening up multiple pages and copy-pasting the key DOM node from each of them into the first one, to create a single consolidated master page, in a way which is a lot easier & more reliable than messing around with the serialized HTML in Emacs.

You can also post-process them. (Because we use these local archives for 'previews' on Gwern.net, and a fully static self-contained HTML page can easily be 100MB+ with all its fonts and images and stuff, we take the SingleFile snapshots and for the large ones, we 'split' them back up, so loading the .html file doesn't necessarily load everything else: https://github.com/gwern/gwern.net/blob/master/build/deconst... And then you can save a lot of space by running standard optimization tools on the split-out files, eg OptiPNG on the revealed PNGs will save gigabytes of space because so many people fail to do the standard image optimizations.)

Compared to "it typically takes me a few minutes to save a page", I handle the majority of pages in a few seconds, and even the nastiest page where I have to delete a lot is usually like a minute. And since I do like 10 URLs a day, this is quite manageable at scale. (I'm up to >15k snapshots, although an unknown fraction are from an initial bulk archiving so may not be of high quality.)