HACKER Q&A
📣 bdamirch

How do I learn more about LLMs and ML?


Hi everyone,

I am about to finish my freshman year in college and over the past 2 months I have been doing a lot of research on LLMs. I have a good high-level grasp of how these systems work, but I was wondering how I could deepen my knowledge. Should I read papers? What kinds of projects do you think are best to learn how to build these systems?


  👤 kolinko Accepted Answer ✓
Aside from Karpathy's videos, I'd say:

Start by any book on traditional neural networks, so you get a decent understanding of what neural networks are, what is an activation function and what the backpropagation is.

From then on, my path was: - numpy user manual, if you don't know it already

- a book about pytorch

- a book about transformers

- a book from Wolfram about LLMs

(forgot the names of books, but you can find plenty on Amazon)

That's as far as reading / theoretical understanding works.

As for understanding in practice (when you've read the theory), I'd say - implement a basic picoGPT/Llama/Mistral model from scratch in python & numpy.

I don't know how good your college is in math, but matrix-vector multiplications, dot products and linear algebra in general is a must.


👤 Weidenwalker
Can‘t recommend the fast.ai course by Jeremy Howard highly enough, it walks you through building your own deep learning stack from scratch. What I really appreciate about it is that it demystifies a lot of jargon to what really are quite simple ideas at their core (e.g. „rectified linear unit“ sounding scary even though its literally only a line with a floor).

The 2022 edition isn’t so much about LLMs as about image generation with stable diffusion, but the underlying techniques are still foundational enough to be generally useful. YMMV, but for me building things from scratch, even if results don‘t reach SOTA, is the single most effective way to learn what‘s really going on.


👤 discordance
Karpathy’s video “Let's build GPT: from scratch, in code, spelled out” helped me a lot:

https://youtu.be/kCc8FmEb1nY


👤 FezzikTheGiant
I'm in the exact same situation as the poster rn. Freshman year of college and really interested in LLMs. I'm planning to take ML/Deep learning courses at my school later on, but in the meantime, what would be the best way to learn how to drive the car without exactly knowing how the engine works? What I mean is I want to learn how to fine-tune and build useful apps on top of LLMs

👤 cdavid
What is your goal ? 1) Know more about how they work on the academic side ? 2) Be able to work in a company that work on LLM ? 3) Be able to work with LLMs 4) agents ? Each of those goal may require different learning "streams"

The book "NLP with transformers", or fast.ai is good for 3). For 1), assuming you do know how they work, I recommend you start reading papers.

I find the discussions around "prompt engineering" to be rather pointless and they are quickly obsolete anyway (newer, more powerful LLMs makes it more and more obvious)


👤 etamponi
Karpathy's "Neural Networks: Zero-to-hero" series has been very useful for me: https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThs...

👤 ofrzeta
Related question: Does it make sense to buy a RTX 4090 or 3090 Ti for this? Or for finetuning existing open models.

👤 brudgers
Build one. Then another. Then another.

Make building them normal not an imaginary exercise.

You don't have to build a good one to learn. Which is good because you probably won't. But no one will care because nobody had to give you permission and nobody is going to give you a grade.

Good luck.