Can GPT improve itself ala AlphaGo Zero?

Question

Yann LeCun is making the case that generative models are fundamentally divergent: at every token, there is a probability of getting something wrong, and errors accumulate exponentially over the number of generated tokens.I tend to agree with the premise, however, what if the generative process is overlaid with an "inner debate", as a substitute to having the model play against itself, ala AlphaGo Zero?The sequence of prompts would go:1. Please explain X2. Criticize your explanation for X, use reason and logic.3. Based on your own critics, improve your explanation of X.I have manually toyed with this approach (the prompts are longer, you get the gist), and it gives very interesting results. This could lead to GPT re-create, on its own, a better high-quality corpus to learn from.Is anybody pursuing this approach for LLM?

senko · Accepted Answer

The thing with AlphaGo Zero is that there is a clear external arbiter of which side of the internal debate wins, so the algorithm can learn.
For LLM to use the technique on the kind of reasoning you talk about, you need a human in the loop to explain it why it's wrong or right, otherwise it just hallucinates random stuff.
That's basically what RLHF[0] is, which was used to great success in training ChatGPT.
[0] https://huggingface.co/blog/rlhf