HACKER Q&A
📣 andrewstuart

How to tell the world a sites content may not be scraped for AI?


How to tell the world a sites content may not be scraped for AI?


  👤 mostlysimilar Accepted Answer ✓
If you think the companies scraping the data care at all what your opinion is you're going to be disappointed. It's the forgiveness instead of permission approach. Break the rules and negotiate the costs after you've already trained your model and are raking in the money.

👤 emedchill
If you don't want someone/something from seeing your content, don't put it on the internet but if that isn't enough:

- add a disallow in your robots.txt (many people say the bots ignore this anyways)

- somehow have your pages so far down in SEO rankings that bots would deem it incorrect/irreverent

- put your content behind a login; this too has it's issues since the bot handler can just get some login credentials to crawl anyways or a user can copy the content elsewhere

- you could also try gaming the system by making your content so offensive that the current AI censorship fad blocks it

- you could try not linking a domain name to the IP, making it harder to find

- sue any AI developer that you think crawled your content