Imagine the trained is like: A->B-C->D (humans typically lie on answer B) the NN picks it up, and lies too.
Just a thought I just had :)
Maybe not this generation, but in two?
Plausible text could be "true'. Or it may just be the text that fits the right shape.
You can see some examples here: https://www.atomic14.com/2023/01/08/prioritising-plausabilit...
The only issues is that if people know this, they could simply flood the internet etc with fake information and if used, it could skew the results.
Recaptcha suffered from this in the early days when people found the first word was the control (known) word and the second was the word they wanted to convert to text.
People then started putting the same offensive word in for the second word, which was accepted and is likely to have affected the results.
People tend to treat ChatGPT as either an oracle or a bullshitter, because they just haven't encountered anything like it yet and try to find a familiar model. But it's just a new form of information aggregator.
What is the difference between "lying" and "repeating or reporting untruth"?