HACKER Q&A
📣 hellotomyrars

Feelings on the effect of ChatGPT on circular training?


Currently it is safe to say that most of the training data fed into the model is original writing, but if a large amount of writing ends up being generated by ChatGPT, what is the ramifications of that as they continue to train the model?

Is there any research on this topic that anyone is familiar with and can share?

Personally I find it concerning the proliferation already of ChatGPT derived writing, and no doubt it is going to continue/increase. Are there (known) safeguards in its training model to try to filter out AI generated content in the data set to try and prevent this?


  👤 brucethemoose2 Accepted Answer ✓
If you do this with image upscaling models, previously invisible artifacts like ringing, noise patterns and such get "amplified." And thats after I culled the input dataset by hand

In other words, artifacts the model tends to generate or humans tend to miss will become more severe if fed into the input.

I think that is a significant concern, especially if the "GPT jank" patterns between models is similar.


👤 PaulHoule
There are a lot of papers on arXiv where people use an LLM to generate training data for another LLM.