Do we need a process for excluding bodies of work from LLM

Question

Do creators need defense from LLM scraping the internet? Create a robot-ai.txt file or include language in licenses that instruct artificial intelligence not to include certain content from large language models? Particularly for open-source software, webpages, and individuals who brand their work?

nullish_signal · Accepted Answer

Much as """Open"""AI does not release Source Code, I am confused why Open-Source repos should be kept from AI training data?

pifm_guy · Answer

It works much the same way as you exclude the rest of the people in the world from a document... Don't publish it on the open internet...