How to prevent search engines from indexing certain pages of my blog?

Question

Hello HN,there is a very annoying law in Germany that says you basically have to put your name and address on your blog, no matter what the content is about.You can't even make it a .jpg to prevent automated data crawling. Courts have ruled against it because it interferes with software that visually impaired people use.Fines up to 50k Euros are possible.So now I'm wondering how I can make it the least likely my blog will show up when somebody searches for my name?Apparently a noindex metatag like this helps?The reason I'm asking here is because the sites that discuss de-indexing pages are concerned with SEO, not privacy issues.Does anybody have any other thoughts on how to achieve maximum privacy without breaking this law?All input is appreciated.

LinuxBender · Accepted Answer

Assuming you are wanting to block bots and allow real interactive humans: these are not perfect
One option - Make your own custom silly captcha. e.g. Picture of a motorcycle and the person has to type in motorcycle. Or say "What is 2 + 2 X 2". Or "What is the average unladen speed of a swallow?" but that last one may eliminate many guests.
Another option - On a landing page, list the username and password for basic auth set on a sub-set of URL's that has your semi-sensitive data.
Another option - Restrict the site to HTTP/2.0. Most bots can't talk 2.0 yet. Bing can. Google can't. Discord and Steam can't. Most of the blog-spam and malware bots can't. Real browsers can talk HTTP/2.0. Another thing that breaks some bots is enforcing strict-SNI.
Another option, though not super effective - Restrict the semi-sensitive content to having a valid referrer. Easy to spoof and some bots will spoof. Along this line one can block a set of known bot user-agents. Again, super easy to spoof and Google intentionally spoofs on a percentage of requests to look like Android phones and to test if you are giving different data to search engines.
Another option though much less effective - Add robots.txt entries but many search engines will still crawl and index the data, but won't show it in public results. Robots shows intent but enforces nothing and not all search engines obey it. It is good practice to have this anyway. Archive.org partially obeys robots, in that they crawl but will hide results if you tell them to. Once your robots is gone or your site is offline for a while they will unhide the data.

slater · Answer

If you can, put a robots.txt in your site's root folder. e.g.:User-agent: *Disallow: /impressum/

tkiolp4 · Answer

If you are not offering something in exchange for money (i.e., you are not a company) in your website/blog then you don&rsquo;t need an impressum. Obviously, if you are a legal company, then anyone can Google your company to figure out who&rsquo;s the founder, so in this case adding an impressum to the company&rsquo;s website doesn&rsquo;t expose more private information that is not already exposed publicly.

josephcsible · Answer

Does the law apply to German citizens hosting blogs anywhere, or only to blogs hosted in Germany? If the latter, can you move your hosting to a country that actually respects your privacy, like the US?

is_true · Answer

not sure this works, but: Create a small page with your info on your blog, add the noindex tag and embed it where you need it using an iframe.

oriettaxx · Answer

oh, if it's as you wrote it is, imho, a pretty stupid law.
can you give us more reference about it?
(using robots.txt helps, buy only with main we search engines that, at least directly, seems to respect it)

How to prevent search engines from indexing certain pages of my blog?

If you can, put a robots.txt in your site's root folder. e.g.:User-agent: *Disallow: /impressum/

Does the law apply to German citizens hosting blogs anywhere, or only to blogs hosted in Germany? If the latter, can you move your hosting to a country that actually respects your privacy, like the US?

not sure this works, but: Create a small page with your info on your blog, add the noindex tag and embed it where you need it using an iframe.

oh, if it's as you wrote it is, imho, a pretty stupid law.can you give us more reference about it?(using robots.txt helps, buy only with main we search engines that, at least directly, seems to respect it)

If you can, put a robots.txt in your site's root folder. e.g.:
User-agent: *
Disallow: /impressum/

oh, if it's as you wrote it is, imho, a pretty stupid law.
can you give us more reference about it?
(using robots.txt helps, buy only with main we search engines that, at least directly, seems to respect it)