Why is it taken for granted that LLM models will keep improving?

Question

Whenever I see discussion of stuff like ChatGPT it seems like there is this common assumption that it will get better every year.And in 10-20 years it&rsquo;ll be capable of some crazy stuffI might be ignorant of the field but why do we assume this?How do we know it won&rsquo;t just plateau in performance at some point?Or that say the compute requirements become impractically high

benlivengood · Accepted Answer

The scaling laws (the original Kaplan paper, Chinchilla, and OpenAI's very opaque scaling graphs for GPT-4) suggest indefinite improvement for the current style of transformers with additional pre-training data and parameters.
No one has hit a model/dataset size where the curves break down, and they're fairly smooth. Usually simple models that accurately predict performance work pretty well nearby existing performance, so I expect trillion or 10-trillion parameter models to be on the same curve.
What we haven't seen yet (that I'm aware of) is whether the specializations to existing models (LoRa, RLHF, different attention methods, etc.) follow similar scaling laws, since most of the efforts have been focused on achieving similar performance on smaller/sparser models and not investing the large amounts of money into huge experiments. It will be interesting to see what Deepmind Gemini reveals.

imranq · Answer

LLMs are comprised of just three elementsDataComputeAlgorithmsAll three are just scratching the surface of what is possible.Data: What has been scraped off the internet is just

fragmede · Answer

Because a lot of smart people are spending a lot of time, money, and effort on this. It's as simple as that. We could go into all sorts of details, like how increase in GPU capabilities will improve training capabilities, both in size and speed, or how GPU(/TPU) capabilities will improve, or how better techniques will make training on the same data set result in better models, or where other improvements will make better use of existing models or make them better or where we're seeing additions to training data sets and how that will improve models using existing techniques. But it really all boils down to a lot of smart people, some with a lot of money, that are personally invested (with time and money) in making them better.That doesn't mean there isn't possibly a plateau somewhere but it's somewhere way off in the distance.

ironlake · Answer

LLMs might hit a wall. Any technology could hit a wall. ChatGPT could be the next Segway. But, like the Segway, LLMs are useful now. I think the impact of "stuff like ChatGPT" on software engineering will equal the impact of the compiler in that eventually no one will consider writing software without a "stuff like ChatGPT" in the tool chain, in the same way that no one works without a compiler now. LLMs are useful now and they've only existed for a few years.
But that's just my opinion and no one knows the future. If you read papers on arxiv.org, progress is being made. Papers are being written, low-hanging fruit consumed. So we're going to try because PhDs are there for the taking on the academic side, and generational wealth is there for the taking on the business side.
E. F. Codd invented the relational database and won the Turing Award. Larry Ellison founded Oracle to sell relational databases and that worked out well for him, too.
There's plenty of motivation to go around.

CamperBob2 · Answer

I don't know about the specifics of mikewarot's point below, but I think he's close to verbalizing a fairly-important truth: there is no reason whatsoever to think that Von Neumann machines are the best way to implement neural networks. There are lots of reasons to think they aren't, starting with the VRAM bottleneck. The impressive results that have been achieved so far have almost certainly come from using the wrong tools. That's cause for optimism IMHO.Digital computer architecture evolved the way it did because there was no other practical way to get the job done besides enforcing a strict separation of powers between the ALU, memory, mass storage, and I/O. We are no longer held to those constraints, technically, but they still constitute a big comfort zone. Maybe someone tinkering with a bunch of FPGAs duct-taped together in their basement will be the first to break out of it in a meaningful way.

aijoe5pack · Answer

It appears as if improving data corpus quality and size and improving processing capacity are still driving performance gains. I have no idea of the functional relationship, and its likely not a Moore's law kind of thing, although that would be an underlying driver of available capacity to saturation.

naet · Answer

I don't think it's a universal assumption. Some people do think it will hit a wall (and maybe do so soon), others think it can keep improving easily by scaling up the compute or the training data.
Good LLMs like ChatGPT are a relatively new technology so I think it's hard to say either way. There might be big unrealized gains by just adding more compute, or adding/improving training data. There might be other gains in implementation, like some kind of self-improvement training, a better training algorithm, a different kind of neural net, etc. I think it's not unreasonable to believe there are unrealized improvements given the newness of the technology.
On the other hand, there might be limitations to the approach. We might never be able to solve for frequent hallucinations, and we might not find much more good training data as things get polluted by LLM output. Data could even end up being further restricted by new laws meaning this is about the best version we will have and future versions will have worse input data. LLMs might not have as many "emergent" behaviors as we thought and may be more reliant on past training data than previously understood, meaning they struggle to synthesize new ideas (but do well at existing problems they've trained on). I think it's also not unreasonable to believe LLMs can't just improve infinitely to AGI without more significant developments.
Speculation is always just speculation, not a guarantee. We can sometimes extrapolate from what we've seen, but sometimes we haven't seen enough to know the long term trend.

karaterobot · Answer

Not an expert, but have wondered the same thing. From what I've read, it comes down to optimism and extrapolation from current trends. Both of these have problems of course, but what else can you do? My working hypothesis is that we'll reach a practical limit on the quality of what we can get from the current class of models, and to extend beyond that would require a new approach, rather than just more data and more horsepower. The new breakthrough would have to be as significant as the last, but would be more likely to happen in a short time span because there is so much more activity in AI research now than even 5 years ago. Again, I'm a dummy about this stuff, not claiming more than that.

jrm4 · Answer

Your skepticism is, I think, very well founded -- especially with such unclear definitions of "improvement."
I think I have a corollary type idea: Why are LLM's not perhaps like "Linux," something than never really needs to be REWRITTEN from scratch, merely added to or improved on? In other words, isn't it fair to think that LoRA's are the really important thing to pay attention to?
(And perhaps, like Google Fuschia or whatever, new LLMs might just be mostly a waste of time from an innovators POV?)

xnx · Answer

The recent history of bigger LLMs suddenly being capable of new things is kind of miraculous. This blog post is a decent overview: https://blog.research.google/2022/11/characterizing-emergent... " In many cases, the performance of a large language model can be predicted by extrapolating the performance trend of smaller models."

ActorNightly · Answer

I dunno if LLMs will get better, but ML in general is a task of compression, and there is definitely a whole bunch of human knowledge and history that neural nets can compress.
Its not unfeasable in the future to have a box at home that you can ask a fairly complicated question, like "how do I build a flying car", and it will have the ability to
- tell you step by step instructions of what you need to order
- write and run code to simulate certain things
- analyze you work from video streams and provide feedback
- possibly even have a robotic arm with attachments that can do some work.

russellbeattie · Answer

Like others in this thread have said, we're just starting to explore the technology. I view it as akin to early CPUs like the 6502 which only did the absolute minimum to today's monsters with large memory caches, predictive logic, dedicated circuits, thousands of binary calculation shortcuts and more all built in. Each small improvement adds up.
From a software perspective, I've wondered for a while if as LLM usage matures, there will be an effort to optimize hotspots like what happened with VMs, or auto indexing like in relational DBs. I'm sure there are common data paths which get more usage, which could somehow be prioritized, either through pre-processing or dynamically, helping speed up inference.
Also, GPT4 seems to include multiple LLMs working in concert. There's bound to be way more fruit to picked along that route as well. In short, there's tons of areas where improvements large and small can be made.
As always in computer science, the maxim, "Make it work, make it work well, then make it work fast," applies here as well. We're collectively still at step one.

jackschultz · Answer

What's not mentioned here is test-time compute. Idea being that, sure, you can spend a ton of compute power on pre-training and fine-tuning, but generation is difficult. So instead of spending all time and power more focused on that, how about spending some time and power on it for the model to generate a bunch of possibilities, and then spend the rest of time having a model verify what's been generated for correctness. That's the Let's Verify Step by step.
Great video to talk about this: https://www.youtube.com/watch?v=ARf0WyFau0A
In threads on LLMs, this point doesn't get brought up as much as I'd expect, so I'm curious if I'm missing talks on this or maybe it's wrong. But I see this as the way forward. Models generating tons of answers, and other models being able to pick out the correct ones, and the combinations being beyond human ability, where after, humans can do their own verification.
Edit:
Think of it this way. Trying to create something isn't easy. If I was to write a short story, it'd be very difficult, even if I spent years reading what others have written to learn their patterns. If I then tried to write and publish a single one myself, no chance it'd be any good.
But _judging_ short stories is much easier to do. So if I said screw it, I'll read a couple stories to get the initial framework, then write 100 stories in the same amount of time I'd have spent reading and learning more about short stories, I can then go through the 100 and pick out the one I think is the best and publish that.
That's where I see LLMs going and what the video and papers mentioned in the video say.

agentultra · Answer

I'm curious if limits of like, thermodynamics, won't play a part here. Or maybe also ecological limits: how long will we allow corporations to use essential, scarce resources to train models without paying their fair share? [0]
I'm not an expert here either but I wonder if there will be the same "leap" we saw from ChatGPT3-4 or if there's a diminishing curve to performance, ie: adding another trillion parameters has less of a noticeable effect than the first few hundred billion.
[0] https://fortune.com/2023/09/09/ai-chatgpt-usage-fuels-spike-... -- I am fairly certain they paid for that water, it was not a commensurate price given the circumstances, and if they had to ask to use it first the answer would have been, no, by a reasonable environmental stewardship organization.

haltist · Answer

The main assumption of techno-optimism is that a large enough computer can do anything people can do and it can do it better. The goal of techno-optimism is to create a mechanical god that will rule the planet and scaling LLMs is a stepping stone to that goal.I, of course, already know how to do all this for a mere $80B.

xboxnolifes · Answer

Happens every time. LLMs, crypto value, stocks, CPU performance, GPU performance, etc.
Anything that has seen continual growth will be assumed to have further continual growth at a similar rate.
Or, how I mentally model it even if it's a bit incorrect: People see sigmoidal growth as exponential.

h2odragon · Answer

> it won&rsquo;t just plateau in performance at some point?I suspect that we've already seen the shape of the curve: a 1B parameter model can index a book; a 4B model can converse, but a 14B model can be a little more eloquent. Beyond that no real gains will be seen.The "technology advancement" phase has already happened mostly, but the greater understanding of theory, that would discourage foolish investments hasn't propagated yet. So there's probably at least another full year of hype cycle before the next buzzword is brought out to start hoovering up excess investment funds.

ortusdux · Answer

I enjoyed Tom Scott's YT video monologue about this. To summarize, he postulates that most major innovations follow a sigmoid growth curve, wherein they ramp up, explode, and then level off. The question then becomes, where are we on this curve? He concludes that we will probably only know in hindsight.https://www.youtube.com/watch?v=jPhJbKBuNnA

afjeafaj848 · Answer

I think part of it is that some people say a person is 20 petaflops of compute
So if we have that much compute power already why can't we just configure it in the right way to match a human brain?
I'm not sure I totally buy that logic though, since I would think the architecture/efficiency of a brain is way different from a computer

jrpt · Answer

They are going to add more abilities onto the system, for example, toolformers or goal planning (like the recent Q* stuff at OpenAI people are talking about). This will make the overall product very powerful.But even if you&rsquo;re looking just at the LLM it seems like there&rsquo;s a lot of ways it can be improved still.

serf · Answer

because no class of software was as good as it'll ever be upon launch -- in other words : there is a normal expectation of improvement after introduction in the software world.

Syonyk · Answer

> How do we know it won&rsquo;t just plateau in performance at some point?We don't.But that's also the sort of thing you can't say when seeking huge amounts of funding for your LLM company.

aristofun · Answer

It will plateau at best. But crowd is never smart, yet can scream.

Why is it taken for granted that LLM models will keep improving?

Happens every time. LLMs, crypto value, stocks, CPU performance, GPU performance, etc.Anything that has seen continual growth will be assumed to have further continual growth at a similar rate.Or, how I mentally model it even if it's a bit incorrect: People see sigmoidal growth as exponential.

because no class of software was as good as it'll ever be upon launch -- in other words : there is a normal expectation of improvement after introduction in the software world.

> How do we know it won’t just plateau in performance at some point?We don't.But that's also the sort of thing you can't say when seeking huge amounts of funding for your LLM company.

It will plateau at best. But crowd is never smart, yet can scream.

Happens every time. LLMs, crypto value, stocks, CPU performance, GPU performance, etc.
Anything that has seen continual growth will be assumed to have further continual growth at a similar rate.
Or, how I mentally model it even if it's a bit incorrect: People see sigmoidal growth as exponential.

> How do we know it won’t just plateau in performance at some point?
We don't.
But that's also the sort of thing you can't say when seeking huge amounts of funding for your LLM company.