So, going from GPU's to some more focused chip architecture could reduce the power usage some, but more likely is that the total power usage will go up not down, as these companies try to address any problem with bigger scale. Not saying it will work, but it is the most likely thing to be tried, and it will take several years before they give up on that route.
Assuming you are comfortable with the current model performance rather than always trying to have the most performant model...
1. Lots of researchers are looking at ways to reduce model size, inference time, etc... We see smaller models outperforming older benchmarks/achievements. Look into model distillation to see how this is done for specific benchmarks, or new approaches like Mixture of Experts (MOE) that reduce compute burden but get similar results as older models.
2. If your concern is cost of running a model, then GPU tech is also getting faster/better so even if compute requirements stay the same, the user will get an answer faster + will be cheaper to run due to the new hardware.
I hope that helps!
For end-user applications: as determined by that applications' constraints. Like TDP or battery life on a smartphone. Or (for subscription based online services): there's a price tag attached, and someone picks up the bill.
So, long-term it'll probably depend on how many useful applications are developed. If changing society completely, then sure expect a continuing stream of $$ to be thrown at the required compute. If mostly a hype, that $$ will quickly dry up.
Here are some recent slides by LeCun on that topic: https://drive.google.com/file/d/1Ymx_LCVzy7vZXalrVHPXjX9qbpd...
So yes we still need new ideas like transformers, but the real enabler of more and more powerful AI is more and more powerful computers
I suspect training algorithms will get better over time, however, the ambitions of models to train will likely keep things constrained for the next decade or so.