HACKER Q&A
📣 hosa

Why aren’t the Wayback Machine archived pages indexed by search engines?


I have had this question for some time now... Not indexing archived pages is contributing to a more shitty and a hassle-full web..


  👤 DamonHD Accepted Answer ✓
If the archive competed with the originals for clicks that would (a) make a lot of site-owners cross and (b) would be serving stale content to users if the original page is still up and being updated.

👤 elliottinvent
Search engines are designed to give you the best result on the web today.

The Wayback Machine / archive.org is a snapshot in time of the web.

If search engines combined the current web and old web it would be an interesting experiment but possiblly a diff nightmare.

Maybe it’s something that could be a point of differentiation for a new search engine compared to Google.

For anyone wanting to take this on, maybe start with Common Crawl [0]

0. https://commoncrawl.org/the-data/


👤 cyberlab
I used an extension once in Firefox that allows you to view the archived version of any URL (providing the site allowed the WM crawler in their robots.txt). It worked for both working URLs and URLs that 404'd or didn't exist anymore / bitrotted pages.

👤 MrCoffee7
Perhaps this article can help you somewhat: https://www.netforlawyers.com/content/archive-wayback-machin...

👤 uberman
Are you asking technically why or philosophically why?