HACKER Q&A
📣 csense

Does anyone have a plan to back up the Internet Archive?


It's been widely reported that the Internet Archive is being sued, and they will cease to exist if they lose the lawsuit.

Does anyone have any plans to back up the data? Maybe a bunch of community members can get together, ask for IA staff to help take a snapshot of the archive before the lawsuit/appeals are officially over, then host the snapshot on IPFS or Bittorrent or something?

It might not have a nice UI that the general public can use, but so long as the data itself isn't lost, someone can always rebuild that. If they find the funding somewhere, and if the data hasn't been burned like the Library of Alexandria.


  👤 sp332 Accepted Answer ✓
It's not in danger of ceasing to exist. http://blog.archive.org/2023/03/25/the-fight-continues/

This case does not challenge many of the services we provide with digitized books including interlibrary loan, citation linking, access for the print-disabled, text and data mining, purchasing ebooks, and ongoing donation and preservation of books.

That said, hard drives are cheap again, and you can just go download the collections from the Archive that are interesting or important to you. However, the raw data from their own Wayback scrapes are not publicly available.


👤 db48x
Some attempts have been made, but it is decidedly non–trivial. Last I checked it was over 50 petabytes of data and growing rapidly.