Do we need a process for excluding bodies of work from LLM
Do creators need defense from LLM scraping the internet? Create a robot-ai.txt file or include language in licenses that instruct artificial intelligence not to include certain content from large language models? Particularly for open-source software, webpages, and individuals who brand their work?
Much as """Open"""AI does not release Source Code, I am confused why Open-Source repos should be kept from AI training data?
It works much the same way as you exclude the rest of the people in the world from a document... Don't publish it on the open internet...