HACKER Q&A
📣 ngiyabonga

Did OpenAI just retrain GPT from scratch?


The announcement[1] from a couple of days ago makes note of increased context length from 4k to 16k for GPT-3.5-turbo and to 32k (from 8k?) for GPT-4. From my limited understanding of how LLMs work / are trained, context window is a key part of training.

Does quadrupling context means they retrained the whole thing(s) from scratch, or there's some other wizardry that can be applied to other models to increase their context length without retraining from scratch?

[1] https://openai.com/blog/function-calling-and-other-api-updates


  👤 ftxbro Accepted Answer ✓
Probably they have trained 'GPT-5' from scratch for the government and military and these new API models are distillations and quantizations of the larger models for relatively faster and cheaper inference at the cost of capabilities.