Why did LLM capability accelerate a few years ago?

Question

When GPT was released, it was a huge milestone in terms of starting a massive growth of AI LLM's. But why? What made this possible? We've done neural networks for years, but why have they suddenly become so good? Is it hardware? Technique? What what the defining moment?

jfengel · Accepted Answer

It's this:https://en.wikipedia.org/wiki/Attention_Is_All_You_NeedThey adapted a technique developed for translation, which had already been advancing a lot over the past decade or so."Attention" requires really big matrices, and they threw truly vast amounts of data at it. People had been developing techniques for managing that sheer amount of computation, including dedicated hardware and GPUs.It's still remarkable that it got so good. It's as if there is some emergent phenomenon that appeared only when enough data was approached the right way. So it's not at all clear whether significant improvements will require another significant discovery, or if it's just a matter of evolution from here.

anshumankmr · Answer

https://www.youtube.com/watch?v=eMlx5fFNoYc&ab_channel=3Blue...

verdverm · Answer

Transformers for text, which had been used for images prior. (gross simplification)This is also what limits them in other ways