Imagine coding LLM's 1M times faster; what uses might there be?

Question

NitpickLawyer · Accepted Answer

Unlimited speed would probably help things like alphaevolve. Given a codebase / paper / idea and some definable "tests" or "goals", go nuts. Evolve, adversarial, n x m tries w/ documentation, n x m tries w/ papers in that field, etc.For the moment, though, I'd take a "smarter" but slower model over x times faster. The current models are plenty fast already. They pump out 300-500 LoC files in seconds/minutes. That's plenty speed.

incomingpain · Answer

$5000 hardware can run GPT 120B at 40-60tps. Which is plenty to code actively through the day.
You could instead use Cerebras to get 3,000 tps; but people really dont do that according to stats. 100x the speed but its not like everyone is rushing to their service.
The speed of LLMs after the advent of MOE has hit a good spot. What we now need is smarter.

parentheses · Answer

I think the answer is obvious so maybe you're looking for something different.More perf means more attempts in parallel with some sort of arbiter model deciding what to pick. This can happen at the token, prompt, or agent level or all of them.