How do I learn more about LLMs and ML?

Question

Hi everyone,I am about to finish my freshman year in college and over the past 2 months I have been doing a lot of research on LLMs. I have a good high-level grasp of how these systems work, but I was wondering how I could deepen my knowledge. Should I read papers? What kinds of projects do you think are best to learn how to build these systems?

kolinko · Accepted Answer

Aside from Karpathy's videos, I'd say:Start by any book on traditional neural networks, so you get a decent understanding of what neural networks are, what is an activation function and what the backpropagation is.From then on, my path was: - numpy user manual, if you don't know it already- a book about pytorch- a book about transformers- a book from Wolfram about LLMs(forgot the names of books, but you can find plenty on Amazon)That's as far as reading / theoretical understanding works.As for understanding in practice (when you've read the theory), I'd say - implement a basic picoGPT/Llama/Mistral model from scratch in python & numpy.I don't know how good your college is in math, but matrix-vector multiplications, dot products and linear algebra in general is a must.

Weidenwalker · Answer

Can&lsquo;t recommend the fast.ai course by Jeremy Howard highly enough, it walks you through building your own deep learning stack from scratch. What I really appreciate about it is that it demystifies a lot of jargon to what really are quite simple ideas at their core (e.g. &bdquo;rectified linear unit&ldquo; sounding scary even though its literally only a line with a floor).The 2022 edition isn&rsquo;t so much about LLMs as about image generation with stable diffusion, but the underlying techniques are still foundational enough to be generally useful. YMMV, but for me building things from scratch, even if results don&lsquo;t reach SOTA, is the single most effective way to learn what&lsquo;s really going on.

discordance · Answer

Karpathy&rsquo;s video &ldquo;Let's build GPT: from scratch, in code, spelled out&rdquo; helped me a lot:https://youtu.be/kCc8FmEb1nY

FezzikTheGiant · Answer

I'm in the exact same situation as the poster rn. Freshman year of college and really interested in LLMs. I'm planning to take ML/Deep learning courses at my school later on, but in the meantime, what would be the best way to learn how to drive the car without exactly knowing how the engine works? What I mean is I want to learn how to fine-tune and build useful apps on top of LLMs

cdavid · Answer

What is your goal ? 1) Know more about how they work on the academic side ? 2) Be able to work in a company that work on LLM ? 3) Be able to work with LLMs 4) agents ? Each of those goal may require different learning "streams"
The book "NLP with transformers", or fast.ai is good for 3). For 1), assuming you do know how they work, I recommend you start reading papers.
I find the discussions around "prompt engineering" to be rather pointless and they are quickly obsolete anyway (newer, more powerful LLMs makes it more and more obvious)

etamponi · Answer

Karpathy's "Neural Networks: Zero-to-hero" series has been very useful for me: https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThs...

ofrzeta · Answer

Related question: Does it make sense to buy a RTX 4090 or 3090 Ti for this? Or for finetuning existing open models.

brudgers · Answer

Build one. Then another. Then another.Make building them normal not an imaginary exercise.You don't have to build a good one to learn. Which is good because you probably won't. But no one will care because nobody had to give you permission and nobody is going to give you a grade.Good luck.

How do I learn more about LLMs and ML?

Karpathy’s video “Let's build GPT: from scratch, in code, spelled out” helped me a lot:
https://youtu.be/kCc8FmEb1nY

Karpathy's "Neural Networks: Zero-to-hero" series has been very useful for me: https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThs...

Related question: Does it make sense to buy a RTX 4090 or 3090 Ti for this? Or for finetuning existing open models.

Build one. Then another. Then another.
Make building them normal not an imaginary exercise.
You don't have to build a good one to learn. Which is good because you probably won't. But no one will care because nobody had to give you permission and nobody is going to give you a grade.
Good luck.