How is the web effected by an AI generated feedback loop?
LLMs get their input largely from the web, as the web is increasingly generated by AI, how will this feedback loop effect their output?
This type of data is actually better than independent human text (specifically for training the LLM that originally produced the output)
GPT4 is trained with PPO+RLHF. The web text that is produced by the LLM then fed back in will be more proximal to the original token distribution.
In other words, by selectively publishing LLM output you’re effectively performing the same action as clicking the thumbs up/thumbs down button on the chatgpt webui.
I agree with openai that this will not be a problem at all, since you would need a process to gauge the quality of the data anyways, even for human text.
Well a possibility is that LLMs won't become much more "intelligent" than what they are currently.
Ive seen this referred to as “AI drift” before and I see it being a long term problem of data sets for training. As always, the quality of the data set determines the quality of the results. Models and data sets will be the arms race in the LLM world for a while to come, I think.
Not all IMHO. I think the next step is layering human choice in weights on yop of the already established models.
Bovine spongiform encephalopathy
Many will say that no AI feedback loop can become better than the original web, because of 'blurry jpeg of the web' arguments, but I disagree.
LLMs now still generating content that contains misinformation, which will end up on the web, LLMs then can learn and propagate those misinformation in their future responses. That's one way this feedback loop can affect their output.