There's a rumor about the number of GPT-4's parameters (1 trillion?). But I'm not sure. To me, it's just an amazing piece of technology that works very well and somehow I don't know how it achieved this performance.
Some of the bigger details:
* Huge compute scale for the pre-training
* Huge data scale
* Well processed and cleaned data (GDB has alluded to this)
* Fine tuning with large amounts of code snippets and examples
* Massive amounts of RLHF
* Continual ongoing RLHF