Are models changing for pinned releases?

Question

Officially OpenAI, Anthropic and others have stated that when using a pinned version that should not be the case. But it is also stated that we can't expect fully deterministic results with temperature 0.Is someone properly/scientifically testing if they get the same level of quality over time? It seems like model distillation works well on benchmarks and they could "easily" improve their gross margins, by swapping in a smaller/cheaper model.E.g. we've seen isolated performance drops from gpt-4o-2024-05-13 to the september version that also came with big price cuts.WDYT?

tripplyons · Accepted Answer

I heard a theory that the reason results were not deterministic for temperature 0 is that there is load-balancing between GPT4's experts across many users' requests, and each expert has different weights. This would explain some of the tradeoff between the price and the quality of the output.Another possibility is that that the price cuts come from weight quantization which can degrade quality.