[1]: The Article, Paragraph 2
I suspect having a few different teams competing (for funding) to provide mirrors would rapidly reduce the hardware cost too.
The density + power dissipation numbers quoted are extremely poor compared to enterprise storage. Hardware costs for the enterprise systems are also well below AWS (even assuming a short 5 year depreciation cycle on the enterprise boxes). Neither this article nor the vendors publish enough pricing information to do a thorough total cost of ownership analysis, but I can imagine someone the size of IA would not be paying normal margins to their vendors.
https://github.com/jjjake/internetarchive
https://archive.org/services/docs/api/internetarchive/cli.ht...
ia search 'format:"Archive BitTorrent"' --itemlist > itemlist.txt
Note that there will be more than 50M items returned by this query, so that command will take a very long time to complete (results are returned in 10k chunks). You'll probably also want to add something like `--timeout 300` as well so you don't get half way through only for the command to fail with a timeout.
u/stavros wrote a design doc for a system (Codename "Elephant") that would scale this up: https://news.ycombinator.com/item?id=45559219
My mental model on this is Anna's Archive's torrent page meets ArchiveTeam's Warrior. Have disk? VM starts up, picks least seeded items from public endpoint, replicates, starts serving, and coverage is constantly reported back by the archive swarm.
(no affiliation, I am just a rando; if you are a library, museum, or similar institution, ask IA to drop some racks at your colo for replication, and as always, don't forget to donate to IA when able to)
[1] It looks like this might exist at some level, e.g. https://github.com/hartator/wayback-machine-downloader, but I've been trying to use this for a couple of weeks and every day I try I get a 5xx error or "connection refused".
I'd say the nonprofit has found itself a profitable reason for its existence
BryantD•1h ago
textfiles•38m ago