there is a very annoying law in Germany that says you basically have to put your name and address on your blog, no matter what the content is about.
You can't even make it a .jpg to prevent automated data crawling. Courts have ruled against it because it interferes with software that visually impaired people use.
Fines up to 50k Euros are possible.
So now I'm wondering how I can make it the least likely my blog will show up when somebody searches for my name?
Apparently a noindex metatag like this helps?
The reason I'm asking here is because the sites that discuss de-indexing pages are concerned with SEO, not privacy issues.
Does anybody have any other thoughts on how to achieve maximum privacy without breaking this law?
All input is appreciated.
One option - Make your own custom silly captcha. e.g. Picture of a motorcycle and the person has to type in motorcycle. Or say "What is 2 + 2 X 2". Or "What is the average unladen speed of a swallow?" but that last one may eliminate many guests.
Another option - On a landing page, list the username and password for basic auth set on a sub-set of URL's that has your semi-sensitive data.
Another option - Restrict the site to HTTP/2.0. Most bots can't talk 2.0 yet. Bing can. Google can't. Discord and Steam can't. Most of the blog-spam and malware bots can't. Real browsers can talk HTTP/2.0. Another thing that breaks some bots is enforcing strict-SNI.
Another option, though not super effective - Restrict the semi-sensitive content to having a valid referrer. Easy to spoof and some bots will spoof. Along this line one can block a set of known bot user-agents. Again, super easy to spoof and Google intentionally spoofs on a percentage of requests to look like Android phones and to test if you are giving different data to search engines.
Another option though much less effective - Add robots.txt entries but many search engines will still crawl and index the data, but won't show it in public results. Robots shows intent but enforces nothing and not all search engines obey it. It is good practice to have this anyway. Archive.org partially obeys robots, in that they crawl but will hide results if you tell them to. Once your robots is gone or your site is offline for a while they will unhide the data.
User-agent: *
Disallow: /impressum/
can you give us more reference about it?
(using robots.txt helps, buy only with main we search engines that, at least directly, seems to respect it)