frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

https://github.com/joelparkerhenderson/queueing-theory
1•jph•3m ago•0 comments

Show HN: Hibana – choreography-first protocol safety for Rust

https://hibanaworks.dev/
1•o8vm•5m ago•0 comments

Haniri: A live autonomous world where AI agents survive or collapse

https://www.haniri.com
1•donangrey•6m ago•1 comments

GPT-5.3-Codex System Card [pdf]

https://cdn.openai.com/pdf/23eca107-a9b1-4d2c-b156-7deb4fbc697c/GPT-5-3-Codex-System-Card-02.pdf
1•tosh•19m ago•0 comments

Atlas: Manage your database schema as code

https://github.com/ariga/atlas
1•quectophoton•22m ago•0 comments

Geist Pixel

https://vercel.com/blog/introducing-geist-pixel
1•helloplanets•24m ago•0 comments

Show HN: MCP to get latest dependency package and tool versions

https://github.com/MShekow/package-version-check-mcp
1•mshekow•32m ago•0 comments

The better you get at something, the harder it becomes to do

https://seekingtrust.substack.com/p/improving-at-writing-made-me-almost
2•FinnLobsien•34m ago•0 comments

Show HN: WP Float – Archive WordPress blogs to free static hosting

https://wpfloat.netlify.app/
1•zizoulegrande•35m ago•0 comments

Show HN: I Hacked My Family's Meal Planning with an App

https://mealjar.app
1•melvinzammit•36m ago•0 comments

Sony BMG copy protection rootkit scandal

https://en.wikipedia.org/wiki/Sony_BMG_copy_protection_rootkit_scandal
1•basilikum•38m ago•0 comments

The Future of Systems

https://novlabs.ai/mission/
2•tekbog•39m ago•1 comments

NASA now allowing astronauts to bring their smartphones on space missions

https://twitter.com/NASAAdmin/status/2019259382962307393
2•gbugniot•43m ago•0 comments

Claude Code Is the Inflection Point

https://newsletter.semianalysis.com/p/claude-code-is-the-inflection-point
3•throwaw12•45m ago•1 comments

Show HN: MicroClaw – Agentic AI Assistant for Telegram, Built in Rust

https://github.com/microclaw/microclaw
1•everettjf•45m ago•2 comments

Show HN: Omni-BLAS – 4x faster matrix multiplication via Monte Carlo sampling

https://github.com/AleatorAI/OMNI-BLAS
1•LowSpecEng•46m ago•1 comments

The AI-Ready Software Developer: Conclusion – Same Game, Different Dice

https://codemanship.wordpress.com/2026/01/05/the-ai-ready-software-developer-conclusion-same-game...
1•lifeisstillgood•48m ago•0 comments

AI Agent Automates Google Stock Analysis from Financial Reports

https://pardusai.org/view/54c6646b9e273bbe103b76256a91a7f30da624062a8a6eeb16febfe403efd078
1•JasonHEIN•51m ago•0 comments

Voxtral Realtime 4B Pure C Implementation

https://github.com/antirez/voxtral.c
2•andreabat•53m ago•1 comments

I Was Trapped in Chinese Mafia Crypto Slavery [video]

https://www.youtube.com/watch?v=zOcNaWmmn0A
2•mgh2•1h ago•0 comments

U.S. CBP Reported Employee Arrests (FY2020 – FYTD)

https://www.cbp.gov/newsroom/stats/reported-employee-arrests
1•ludicrousdispla•1h ago•0 comments

Show HN: I built a free UCP checker – see if AI agents can find your store

https://ucphub.ai/ucp-store-check/
2•vladeta•1h ago•1 comments

Show HN: SVGV – A Real-Time Vector Video Format for Budget Hardware

https://github.com/thealidev/VectorVision-SVGV
1•thealidev•1h ago•0 comments

Study of 150 developers shows AI generated code no harder to maintain long term

https://www.youtube.com/watch?v=b9EbCb5A408
1•lifeisstillgood•1h ago•0 comments

Spotify now requires premium accounts for developer mode API access

https://www.neowin.net/news/spotify-now-requires-premium-accounts-for-developer-mode-api-access/
1•bundie•1h ago•0 comments

When Albert Einstein Moved to Princeton

https://twitter.com/Math_files/status/2020017485815456224
1•keepamovin•1h ago•0 comments

Agents.md as a Dark Signal

https://joshmock.com/post/2026-agents-md-as-a-dark-signal/
2•birdculture•1h ago•0 comments

System time, clocks, and their syncing in macOS

https://eclecticlight.co/2025/05/21/system-time-clocks-and-their-syncing-in-macos/
1•fanf2•1h ago•0 comments

McCLIM and 7GUIs – Part 1: The Counter

https://turtleware.eu/posts/McCLIM-and-7GUIs---Part-1-The-Counter.html
2•ramenbytes•1h ago•0 comments

So whats the next word, then? Almost-no-math intro to transformer models

https://matthias-kainer.de/blog/posts/so-whats-the-next-word-then-/
1•oesimania•1h ago•0 comments
Open in hackernews

Building a personal archive of the web, the slow way

https://alexwlchan.net/2025/personal-archive-of-the-web/
8•ingve•8mo ago

Comments

gwern•8mo ago
OP's workflow might be much more efficient with use of https://github.com/gildas-lormeau/SingleFile/

It can handle most of what they describe for things like private/paywalled pages or media enclosures or completely self-contained archives that live locally or easy to use or editing before saving or ensuring lazy-loaded images are there, you can view it immediately to check for breakage, it automatically works with adblock and NoScript and when you delete stuff in the DOM using the picker so they can clean each page very efficiently (create a bunch of rules in your adblock by picking elements like in ublock, so you never have to do those again, then quickly mouse any remainder), and it stores the final DOM so you can interact with stuff to make sure it is visible or archived.

So what I do ( https://gwern.net/archiving#preemptive-local-archiving ) is I have a script which calls SingleFile-CLI in a headless Chrome browser to automatically archive everything, and then opens up the original URL + snapshot in my normal Firefox, and look at the snapshot then original. If the snapshot looks good, I simply close the 2 tabs after a few seconds and I'm done; if the snapshot looks bad, then I look at the original and make edits: use Ublock Origin to define any necessary rules (assuming the page isn't cleaned up by all the rules I previously defined), make any minor tweaks to the DOM, and then SingleFile-browser-extension it manually.

If you use enough adblock rules, then you get a similar effect to the 'templates' described, since it looks like OP is mostly just trying to remove as much as possible. But since you're archiving the final DOM, you can do anything you like. Something I've done a few times is opening up multiple pages and copy-pasting the key DOM node from each of them into the first one, to create a single consolidated master page, in a way which is a lot easier & more reliable than messing around with the serialized HTML in Emacs.

You can also post-process them. (Because we use these local archives for 'previews' on Gwern.net, and a fully static self-contained HTML page can easily be 100MB+ with all its fonts and images and stuff, we take the SingleFile snapshots and for the large ones, we 'split' them back up, so loading the .html file doesn't necessarily load everything else: https://github.com/gwern/gwern.net/blob/master/build/deconst... And then you can save a lot of space by running standard optimization tools on the split-out files, eg OptiPNG on the revealed PNGs will save gigabytes of space because so many people fail to do the standard image optimizations.)

Compared to "it typically takes me a few minutes to save a page", I handle the majority of pages in a few seconds, and even the nastiest page where I have to delete a lot is usually like a minute. And since I do like 10 URLs a day, this is quite manageable at scale. (I'm up to >15k snapshots, although an unknown fraction are from an initial bulk archiving so may not be of high quality.)