I am about to finish my freshman year in college and over the past 2 months I have been doing a lot of research on LLMs. I have a good high-level grasp of how these systems work, but I was wondering how I could deepen my knowledge. Should I read papers? What kinds of projects do you think are best to learn how to build these systems?
Start by any book on traditional neural networks, so you get a decent understanding of what neural networks are, what is an activation function and what the backpropagation is.
From then on, my path was: - numpy user manual, if you don't know it already
- a book about pytorch
- a book about transformers
- a book from Wolfram about LLMs
(forgot the names of books, but you can find plenty on Amazon)
That's as far as reading / theoretical understanding works.
As for understanding in practice (when you've read the theory), I'd say - implement a basic picoGPT/Llama/Mistral model from scratch in python & numpy.
I don't know how good your college is in math, but matrix-vector multiplications, dot products and linear algebra in general is a must.
The 2022 edition isn’t so much about LLMs as about image generation with stable diffusion, but the underlying techniques are still foundational enough to be generally useful. YMMV, but for me building things from scratch, even if results don‘t reach SOTA, is the single most effective way to learn what‘s really going on.
The book "NLP with transformers", or fast.ai is good for 3). For 1), assuming you do know how they work, I recommend you start reading papers.
I find the discussions around "prompt engineering" to be rather pointless and they are quickly obsolete anyway (newer, more powerful LLMs makes it more and more obvious)
Make building them normal not an imaginary exercise.
You don't have to build a good one to learn. Which is good because you probably won't. But no one will care because nobody had to give you permission and nobody is going to give you a grade.
Good luck.