HACKER Q&A
📣 hn92726819

Where to learn the specifics of how GPT4-quality models are created?


I have never followed AI/ML work. In my experience, all the AI/ML workers I've encountered seen to throw stuff at the wall and see what sticks. Clearly, that is not how the GPT-3/GPT-4 models were built.

Is there a resource that explicitly explains at a technical level the thought process behind a full implementation of a model like these? For example, why they chose to add an X-type layer here, or what layers they may have tweaked and why to improve 3.5 to 4's quality?

I understand OpenAI is closed source. But I'm sure there isn't some secret sauce that only employees there know about that makes these models happen, so where does one learn this?


  👤 ActorNightly Accepted Answer ✓

👤 ailef
I think the only way to really know what's going on, in depth, is to read the original research papers.

If you're satisfied with a cursory understanding then you can read "derivative" material, otherwise you need to start with the basics of how modern deep learning models work and go deeper until you're able to read and understand more advanced stuff like transformers (and thus GPT).