However, since the launch of ChatGPT I've been thinking about reviving my blog partly to discus AI risk, but also to educate those new to neural networks how they work. To this day I still find the vast majority of the content on neural networks online to be extremely poor. You tend to find content is either highly academic and therefore inaccessible, or too high level, eg, "build a CNN in pytorch".
The first I dislike because it's not accessible or useful for those new to the field. The latter I dislike because they rarely touch on why things are done – sure anyone can copy paste some Pytorch code, but how does the Adam optimiser work? Why do we use ReLU activation vs sigmoid? How should weights be initialised? These are things that those inaccessible academic articles are actually useful for.
In my experience it's only when you try to build your own ANNs (without libraries) that you realise how much hand holding the libraries do and how little you actually understand. I had this issue when I was learning about AI at university too. It was mostly focused on the theory and it was only when trying translating that theory to something practical I realised how little I understood.
I recently watched Andrej Karpathy's introduction to neural networks and I was really impressed since it was both practical and informative. There's no magic libraries and everything discussed is demonstrated with practical examples helping develop intuition – exactly the kind of content I find difficult to find online.
Anyway, I guess this is a really long-winded way of asking if anyone learning about ANNs at the moment shares this frustration? Or if there's anything else you've found difficult?
While I have a sense of achieving near-total understanding of each component, can I confidently say I've mastered the whole topic? For instance, given unlimited time, could I replicate the functionality of 'makemore' without resorting to any online searches?
The answer, quite likely, is no.
Lately, I've been attempting to integrate Anki into my learning process. It becomes particularly helpful during a second review of a subject, where I can identify crucial information for recall. Additionally, using tools like ChatGPT, I can clear up any misunderstandings before making each flashcard.
- metrics code
- batching code
- data parsing/formatting/quality
- data sanitation(avoiding leakage)
- devising a proper dummy baseline early in the design process
- sufficient patience/grit, at a personal level
- data sufficiency
- checking proper gradient flows
- knowledge of the hardware
- knowledge of each framework’s quirks
- knowledge of the probability theory
- knowledge of numerical stability
- ability to setup a complicated stack
- ability to debug both math problems and extremely complicated program stacks
- data normalization
- knowledge of nn architectures
- knowledge of a constantly accelerating milieu of the state of the art
- knowledge of which versions of which libraries/dependencies are buggy
- ability to compare research papers and research code very carefully and critically
- ability to make results reproducible
- ability to share and communicate results
- ability to prioritize beforehand which few of the many possible options you could try, are actually worth your limited time at the moment
- ability to find prior art that has been overlooked by the current popular methods
- ability to optimize code to run fast
- ability to scale up the process to a larger distributed system when required
- ability to unit test stochastic code
- ability to modularize extremely bespoke spaghetti research code once its gotten ugly
But the most difficult part is: - intuition behind them, why they work at all. What a good parameter set is - how to check if training goes well - what to do when things go wrong - how to train large models. I mean gpt4 was trained for a long time. For sure they did some things before and during the long training period.
And so on. I mean because the models are rather „simple“, I think the best engineers get paid pretty high, because it is not that much about the models but everything around it.
I realised that I had many gaps understanding what and how to partially derive with vectors and making sure their dimensions aligned with the 'with respect to variable'.
Most of my exposure was just to 'variables/letters' and not vectors. So thinking in dimensions during derivation caught me off guard.
Edit: It took my 3 days on and off to figure it out...
I'm currently interested in delving into neural networks myself and find a lot of the information inaccessible atm. decided to see if I could find this resource you're taking about.
I'm guessing you're referring to this site? https://karpathy.github.io/
It is indeed useful to learn things that way but it definitely isn't a missing category in any way.
- Data Science from Scratch, by Joel Grus
- The first 5 deeplearning.ai courses (whose exercises relied only on numpy IIRC)