It seems like every time I submit a question to chatGPT, I imagine some fancy math is being done behind the scenes and I am getting fed back some information from this massive model "file".
Is the model just a compressed representation of all the training data that the model was trained on and so sending back a response is just 'decompressing' the part of the model that is relevant to my question and sending me back the requested information? (I imagine it hallucinates the parts that vary from the data it was trained on to give a best-guess response to my query).
Wonder if there are any experts on here that could confirm/deny my intuitive guess here.
https://en.wikipedia.org/wiki/Hutter_Prize
A major part of the training of LLMs is training to predict the next token, or similarly, they mask out 15% of the tokens randomly to ask the model and predict those. These tasks are closely related to this compression algorithm
https://en.wikipedia.org/wiki/Prediction_by_partial_matching
There is more to it than that though. If I wanted to make a classifier that answers a question like "Is this an article about a sports game?" I might start with an LLM trained on the above tasks and then give it a small amount of training on that task... Thanks to the patterns it has already learned it has a head start at recognizing the patterns that matter for my task.
ChatGPT uses another form of training which is reinforcement learning based on human feedback. This involves asking it questions and then having people judge the results that it generates. This is the way it learns to go past "predict the next token" and actually do things that please people... But that training on compression-related tasks makes that second stage of reinforcement learning go better just as it makes my classification tasks go better.
https://www.eecs.tufts.edu/~dsculley/papers/compressionAndVe...
https://www.academia.edu/6214482/The_Relationship_of_Machine...
And, ... well, Google around, you'll find stuff. Unfortunately it can be hard to separate out the stuff that just talks about the connections between compression and ML in the abstract sense, from papers talking about specific applications of ML/DL for various aspects of compression. Not that the latter aren't also interesting, of course.
If you're also familiar with generative image models, you know there are models where you can type a sentence like "an astronaut on a horse" and it will draw you a pic. You can see how you could (imperfectly) compress a bunch of images by writing a caption for them, and uncompress them by running the caption through the generative model.
The analogy holds for language models. You can compress a story about the assassination of Abraham Lincoln into the phrase "write me an account of the assassination of Abe Lincoln" and it will "uncompress" it with a common knowledge account of that event.
Behind the scenes, the actual mechanism is different as others have mentioned, but the effect can be framed as I describe. Its basically filling in the gaps with common sense as learned from the text it was trained on.
Thats less of a good analogy for happens in the middle. I am no expert here, but I think of it as water running through a giant network of specifically patterned canyons.