HACKER Q&A
📣 PaulHoule

Why doesn't archive.today get shut down?


So far as I can tell you can use archive.ph to bypass paywalls on most news sites.

Scientific journal publishers have gotten crackdowns on sites like sci-hub, music companies got Napster shut down, there has been a continuous game of whac-a-mole against torrent sites like the Pirate Bay.

archive.ph never seems to be the focus of a controversy, I never hear about anytbody trying to shut them down, they don't even seem to be struggling with technical countermeasures against their paywall bypass.

Once in a while you see a crazy rant like

https://www.vice.com/en/article/ypw5mj/dear-gamergate-please-stop-stealing-our-shit

but there is no real movement against this site.

How do they get away with it?


  👤 aliqot Accepted Answer ✓
Sometimes I wonder why people don't have the foresight to reserve posts like this for their personal thoughts. Don't ruin a good thing with speculation of its demise.

👤 superkuh
Why would it get shut down? Companies like Google and Cloudflare do the same thing with AMP based re-hosting of websites without consent. Some site signs up for AMP for it's domain, then any links from that site to other 3rd party sites get sucked up by Google/Cloudflare and re-hosted as an AMP site on their servers without the 3rd party site getting hits from the actual people clicking links.

Re-hosting is not a crime. I mirror sites locally and put them up on a domain subdirectory all the time. I've done it since before the web was commercial.


👤 rlpb
Maybe it's difficult to justify a takedown if simultaneously they're responding to requests from search engines with full content for indexing, as part of their business model. Either the full content is available to be indexed, or it isn't - which is it?

👤 mikequinlan
My local newspaper will issue copyright takedowns on Reddit when they see archive.org links to their articles. They are owned by McClatchy so I assume this applies to all McClatchy newspapers.

👤 Izkata
There have been attempts to block it, that's why it has so many domain names. I think one of the earlier ones was archive.it, but it no longer goes there.

👤 beardyw
I suspect that many sites ignore it because it helps get attention. No one is seriously going to browse a website that way, so an odd article for free won't harm them and may help gain an audience.

👤 Am4TIfIsER0ppos
Journos tried to bad-mouth archives and especially archive.{is,ph,today} back in 2014 and have been ever since because it shows how much you stealthily edit articles to hide the mistakes you make (lies you tell).

👤 pointlessone
archive.today (and WayBack Machine, and many others) doesn’t do anything supernatural. It can get a full article because publishers let it. And that’s the main reason no one’s shutting down it.

Try it with Elsevier and see that they won’t give an article for free under no circumstanses. There’s no free copies of any of their articles on archive.today.


👤 skibob1027
Go figure, archive.ph / archive.today is not working for me as of 12/8/22. Is anyone else having issues accessing the site?

👤 MrWiffles
To the best of my admittedly limited knowledge on the subject, they're relatively new so most news sites aren't terribly aware of them yet, and most people aren't technology-savvy enough to be aware of their existence and ability to make use of them outside circles like ours here on HN and related occupations/sites/etc. That said, they're also not doing anything shady to get by the paywall either. The version they're presenting you is the exact same version the news sites are presenting search engines in order to grab search engine traffic. The news sites are literally participating in an intentional bait-and-switch scheme to bait people with relevant search results that are NOT paywalled, then when a human browser gets there, they throw up a paywalled version in your face via user agent detection, mandatory javascript, etc. (various means are used). archive.ph simply mimics a search engine indexer to get an un-paywalled version, same as Google or any other search engine, in order to retrieve the cleaned up version of the article without a paywall there, and serves that content to the end user. It's not stealing content not already offered in other forms anyway, it's just removing an artificial dark pattern that's literally intended to bait and switch people in the first place. Kind of makes for a weak argument if they do bring it to court in the first place; glass houses, throwing stones and all that.

👤 theGnuMe
It's hosted in Russia I think.

👤 _zfxr
Bruh... delete this thread.

👤 coorski
what about https://12ft.io/ for some paywall bypass

👤 datalopers
If a news site wants customers to pay for the content, then they should put it behind a walled garden and not paywalls which only apply to certain used-agents and IPs.