HACKER Q&A
📣 nkmnz

Microsoft crawls private links – how can this be legal?


I've built a tiny private website to share wedding photos. In order to protect the privacy of my guests, I've implemented authentication via supabase's magic link – guests need to enter their mail address and get an access link with a short-lived token sent to their mailbox. Unfortunately, this didn't work for users of mailboxes hosted with Microsoft, because Microsoft clicks all links in emails for "security" purposes. I've run some experiments and they seem to download all assets from the website linked to, including images, and try to crawl all other pages that can be reached through the first link in the mail, rendering not only the magiclink useless, but also downloading massive amounts of password protected images - a clear violation of GDPR. Why hasn't this been stopped?


  👤 cf1241290841 Accepted Answer ✓
This might be Microsoft looking for pictures of naked children ?

https://www.microsoft.com/en-us/research/publication/metadat...

edit: In which case its part of the TOS and has government support. I would look for the press release statement for the change of terms of services but i dont want to search for "Child Sexual Abuse Material". Its apparently called photodna at microsoft.

If i recall apple changed theirs a few months ago.

edit: If you think i am being overly paranoid, last time i looked for the topic google put some pedophile self help sites into my results.


👤 jiveturkey
> Microsoft clicks all links in emails for "security" purposes.

SOP. many email providers do this. the magic link shouldn't directly take you to the thing, but rather be an interstitial, so that security vetting prefetch can be managed.

Further crawling seems quite bad. Are the pages actually public, protected only by URL obscurity? Not justifying MSFT behavior here, but you say "password protected" but the scheme you've described doesn't seem to be that. Can you set a session cookie after the magic link, and assuming MSFT crawler doesn't save cookies, "defeat" it that way? Or, identify it by UA and defeat it that way?

I think you need to provide more info about how they are crawling if you want a solution.

I don't see how this is a GDPR violation. Crawling the data and evaluting security or malware issues doesn't fall under GDPR. Saving the content would, and probably is additionally a violation on your part, but you haven't indicated they are doing that.

Also not clear why the magiclink doesn't work. Just because MSFT used it to crawl, why can't the user also use it later?


👤 k310
May I suggest a honeypot?

I'm pretty sure that a robots.txt is only a suggestion, not legally binding in any way.

If enough people put honeypot links in their emails .....


👤 apapapa
All major email cloud providers read your emails, unfortunately.

👤 mtmail
Microsoft scans to check the website contains malware. IMHO the security blunder is a self-implemented magic link.

> massive amounts of password protected images

Not password protected if the password is part of the URL.

> a clear violation of GDPR

Unclear to me which PII is being stored or used.