I'm struggling to understand what these models are actually learning. They can be applied to all sorts of problems, but what fundamental things are beng encoded in the model?
👤 jstx1 Accepted Answer ✓
They take a sequence of tokens and they do their best to continue the sequence in a plausible way. That's the basic idea - given the text so far figure out what's likely to follow.