Open Source @Github

fp.

Ask HN: How does one scrape a website?

2•terry_hc•1mo ago

I wish to scrape and preserve the blog of a dead author, before their domain and writings expire. What would be the most accurate wget invocation for obtaining all their articles, all images, et cetera, that reside under the website's domain, such that the whole site (bar external content) can be browsed locally?

Comments

reliefcrew•1mo ago

Just ask AI how to mirror w/ wget. But, beware that if the site relies on javascript, wget may not be enough. In that case you'll need to program some kind of headless browsing. Didn't the internet archive (archive.org) take care of everything for you already though?