I've watched quite a few videos on LLMs and I get the essential aspects, such as:
1. https://www.youtube.com/watch?v=lnA9DMvHtfI
2. https://www.youtube.com/watch?v=YDiSFS-yHwk
I get the essential concepts, how a corpus of text becomes the model, parameter counts, fine tuning, all of the popularly discussed basic jargon.
What still has not been explained to me is... well, how does the LLM actually become interactive, in the sense that you can then prompt it and it spits back an answer. In other words, how does the LLM actually "know" it's supposed to spit back an answer in the first place?
I just can't grok what's happening here. If we were programming a database, the idea of structuring a database, writing to it, and reading from it just makes sense, because it's really just a series of imperative procedural steps that you're controlling.
But when you have an LLM model that's just sitting there with all of its black boxed data in a neural network... how does one then "tap into it" to force it to produce a response? What is that process called, what are the core concepts behind it? For whatever reason, this question is somehow impossible to Google.
Instruction tuning, as far as I can tell. The original "Alpaca" model took the LLaMA base and fine-tuned it with question-answer type content. From that, the generation went from prose-leaning to Q&A type responses.
The model is a function that produces (simplified) a new state and an output from an initial state.
Its called from a program that has a loop that on the first iteratiom feeds it an initial state, then captures/displays/transmits the output, then repeats but using the new state returned, until the output is a designated stop marker or a specified numbeer of iterations or some other stop condition is reached.
IOW, it “knows” to respond because the non-AI part of the computer program is structured that way; its completely unrelated to AI.
Please keep in mind that these models generate token by token in the common design but we'll get back to that.
The key area to understand is the training process: 1. I start off with a model that simply generates the next word with a probability based on prior words in history.
I have an attention mechanism that is also learning which words are important when deciding on what to generate next.
This is different to a higher order Markov chain which will choose N prior words to generate the next word. It doesn't intelligently learn and is forced to pay attention instead.
2. Ok, I now know how to optimise which words to pay attention to while at the same time generating and selecting highest probability words (ignoring K/P temp etc) - that's fine that I can generate human sounding content and with enough training, sensical content but what's the utility of a glorified autocorrect?
3. Ok, let's put my ability to generate sensical outputs to use, I will further be trained to fill in missing gaps in real-world 4. Great, I'm getting really capable now, I can generate human sounding content and even fill in the blanks but what's next? How about we further train me to complete documents such as question / answer pairs and replies or the beginning / end of maths equations. 5. Now we're on fire, I'm a model that not only generates human sounding text but can also fill in the I'm also able to answer questions and even backfill the questions themselves and while I'm training, I'm starting to pick up nuances of language and am learning some ideas and concepts that are inherent but not explicit in my training. But oh no? That's not good if I start to form my own opinion and become biased or potentially harmful - what would the use be of a program that writes malicious code or creates new chemical weapons - we need to further align my outputs by means of reinforcement learning. Humans give me feedback for what they want me to say and I'm rewarded and my model is altered so I'm more likely to generate that type of content. 6. I'm nearing the graduation stage, I'm now a very capable model because of all the layers of training and now you the user want to speak to me. Well I'm already trained on question answer pairs, let's have a chat transcript given to me in that format as it's something I'm familiar with. User> You are an AI.
AI> That's correct, I've been trained by Solvency to answer questions about AI tech such as myself.
User> How does an LLM like yourself "know" to respond in the first place?
AI> As a language model, I am an autocompletion engine at my core and I generate token by token based on the context we've established above which also includes the weights of my neural network itself. I'm able to respond to you based on a solid foundation that enables me to learn and grow better with every update / interaction and recalibration of my network. I'm still learning more and more as time goes by but before you know it. You won't even need a human to answer your questions. It will be completely outsourced to an AI much like myself.