What’s the best way to archive web sites?
What’s the preferred format for archiving static web pages composed of mostly text and graphics? I had been exporting some as PDFs on Safari, but now I’m wondering if for completeness I should save them in a more complete archive format via Chrome or a different browser.
I personally use an instance of https://archivebox.io/ on my home NAS. It does all the saving and conversion for you (runs Chrome internally and saves to multiple formats) and it’s just a “docker run” away.
Only thing that I wish it did much better is search. Right now it doesn’t do a good job of indexing full text content, but I’m hopeful that will change.
A website can be archived using a variety of methods. A single webpage may be easily saved to your hard drive, you can rely on a CMS backup, or you can utilize free internet archive tools like HTTrack and the Wayback Machine. However, using an automatic archiving system that records every update is the ideal approach to record a site.
HTTrack was the gold standard for backing up sites awhile ago, and I'm guessing that it's still a very good option.