I don’t know anything about it, and my goal would be to start a learning process that would lead me to a fairly deep understanding of how something like ChatGPT works.
To make an analogy, I am not a kernel developer, but I am experienced enough in C and theory of operating systems/hardware that I would be able to jump into any Linux kernel subsystem code, for example the virtual memory management, and understand its inner working within a few days of studying it.
I would like to get to this same level of comfort with AI.
What would be the recommended studying material to get there? Is there something that starts from first principles, covering the math behind it and such? I have a master degree in computer engineering so I have done my fair share of courses in linear algebra/calculus/statistics in the past, but a refresher custom tailored to AI would be good as well.
Thanks
At first, I would try to find the paper (googs engineers published) which is entitled "attention is all you need". Then, try to search with goog on Twitter or directly on YouTube "build chatGPT from scratch". There are quite a lot of videos made and especially a longish video Made by andrej karpatov (Tesla's AI guy). So here you'll get how this works and get the tokenizing explained. From the scratch.
And that is your start. Then I would go to hugging face and check what's happening there. But yeah.. there is a HN post for a "search engine for AI related things" that has been posted no longer than two weeks ago. I can't find it anymore, but, there all AI news are collected from several sources. Also papers and tweets and and and...
Upd: found it