How is gtp-3.5-turbo so much cheaper?

Question

I understand gtp-3.5-turbo is 10x cheaper per token than other gtp-3.5 models. And that OpenAI has been cagey about technical details so we can't know anything for sure. But I'm interested to hear views from people with better knowledge of LLMs, is it likely that they've found a way to make it ~10x cheaper to run (in electricity costs etc) or is it more likely it's a strategic loss to make it harder for competitors to get a foothold in the market?

maxutility · Accepted Answer

Not a LLM-expert, but here are three theories in descending order: 1. Quantization (e.g. fewer bits per weight) 2. Optimization to dedicated hardware 3. (Speculative) pruning of parameters to get comparable performance with a smaller model