HACKER Q&A
📣 1vuio0pswjnm7

Why does archive.is track who reads submitted content via DNS


Does anyone know why archive.is tracks who has read an article. If we look at the HTML from archive.is we see they include an image link to a domain name that contains the user's IP address. Popular browsers not only run arbitrary Javascript by default, they also access image links by default. Thus the archive.is DNS server has a record of every time the page is viewed in one of these browsers (and the domain name is not cached). This includes the IP address of the person who retrieved the page.

    
It seems some HN commenters have a preference for sharing links to https://archive.is as a way of avoiding Javascript obfuscation of text (so-called "paywalls").

The text being obfuscated is public, available to anyone, not only to subscribers. It is not password-protected. The website developer using the "paywall" technique simply tries to annoy the user into subscribing by obfuscating the text using Javascript. This only works if the user runs Javascript from the website. Popular browsers, most of them funded by advertising, run arbitrary Javascript by default, however Javascript can be disabled by by anyone by simply changing default settings. Most users do not change default settings. Most users use the same small number of popular browsers.

When we refrain from running Javascript and accessing image links automatically, the web becomes more readable and less annoying. We can choose a browser that is simpler than the popular ones and does fewer things automatically without user input, e.g., running Javascript and loading images.


  👤 r721 Accepted Answer ✓

👤 alphabet9000
yeah, nice find, you're right - it is there - (although uBlock Origin seems to have blocked it from successfully connecting). it's weird that that is there in the first place, as they would already have the IP info in their HTTP server logs if they wanted to keep track of IPs, right?

👤 viraptor
> Thus the archive.is DNS server has a record of every time the page is viewed in one of these browsers

They already have the same information in http logs. Whatever the reason is for this request, it's nothing sneaky or covert.


👤 1vuio0pswjnm7
Remember that in addition to HTTP requests for the image that might appear in httpd logs, DNS lookups for the domain containing the user's IP address as a subdomain would be exposed to anyone performing passive DNS monitoring. Also, shared DNS caches run by third parties are quite popular.

👤 miyuru
my guess it since they do geodns load balancing, this is used to collect data for their DNS.

👤 huhtenberg
It also tries to pull code.js off mail.ru subdomain.

👤 the_biot
This is too creepy for even Google to do, although they tried it out for a while. I think it was more like a hash of sorts -- a cookie in the DNS name of a Google tracker.

👤 hulitu
Because they can. There is also the possibility that someone (3 letter) invested in them to implement this feature.

👤 slim
Maybe because they don't keep logs on the http server ? Logging on dns is transparent and limited

👤 bawolff
So?

They hardly need to do that to track ips, its probably just easier.