For the moment, though, I'd take a "smarter" but slower model over x times faster. The current models are plenty fast already. They pump out 300-500 LoC files in seconds/minutes. That's plenty speed.
You could instead use Cerebras to get 3,000 tps; but people really dont do that according to stats. 100x the speed but its not like everyone is rushing to their service.
The speed of LLMs after the advent of MOE has hit a good spot. What we now need is smarter.
More perf means more attempts in parallel with some sort of arbiter model deciding what to pick. This can happen at the token, prompt, or agent level or all of them.