https://www.microsoft.com/en-us/research/publication/metadat...
edit: In which case its part of the TOS and has government support. I would look for the press release statement for the change of terms of services but i dont want to search for "Child Sexual Abuse Material". Its apparently called photodna at microsoft.
If i recall apple changed theirs a few months ago.
edit: If you think i am being overly paranoid, last time i looked for the topic google put some pedophile self help sites into my results.
SOP. many email providers do this. the magic link shouldn't directly take you to the thing, but rather be an interstitial, so that security vetting prefetch can be managed.
Further crawling seems quite bad. Are the pages actually public, protected only by URL obscurity? Not justifying MSFT behavior here, but you say "password protected" but the scheme you've described doesn't seem to be that. Can you set a session cookie after the magic link, and assuming MSFT crawler doesn't save cookies, "defeat" it that way? Or, identify it by UA and defeat it that way?
I think you need to provide more info about how they are crawling if you want a solution.
I don't see how this is a GDPR violation. Crawling the data and evaluting security or malware issues doesn't fall under GDPR. Saving the content would, and probably is additionally a violation on your part, but you haven't indicated they are doing that.
Also not clear why the magiclink doesn't work. Just because MSFT used it to crawl, why can't the user also use it later?
I'm pretty sure that a robots.txt is only a suggestion, not legally binding in any way.
If enough people put honeypot links in their emails .....
> massive amounts of password protected images
Not password protected if the password is part of the URL.
> a clear violation of GDPR
Unclear to me which PII is being stored or used.