HACKER Q&A
📣 kypro

Those learning about neural networks, what do you find most difficult?


About a decade ago I ran a blog focused on AI where I would try to post interesting practical tutorials for people to learn from. But around 2012ish with the launch of professional e-learning sites like Coursera I figured I was probably doing a disservice making content when people would probably be better served learning elsewhere from well regarded experts in the field.

However, since the launch of ChatGPT I've been thinking about reviving my blog partly to discus AI risk, but also to educate those new to neural networks how they work. To this day I still find the vast majority of the content on neural networks online to be extremely poor. You tend to find content is either highly academic and therefore inaccessible, or too high level, eg, "build a CNN in pytorch".

The first I dislike because it's not accessible or useful for those new to the field. The latter I dislike because they rarely touch on why things are done – sure anyone can copy paste some Pytorch code, but how does the Adam optimiser work? Why do we use ReLU activation vs sigmoid? How should weights be initialised? These are things that those inaccessible academic articles are actually useful for.

In my experience it's only when you try to build your own ANNs (without libraries) that you realise how much hand holding the libraries do and how little you actually understand. I had this issue when I was learning about AI at university too. It was mostly focused on the theory and it was only when trying translating that theory to something practical I realised how little I understood.

I recently watched Andrej Karpathy's introduction to neural networks and I was really impressed since it was both practical and informative. There's no magic libraries and everything discussed is demonstrated with practical examples helping develop intuition – exactly the kind of content I find difficult to find online.

Anyway, I guess this is a really long-winded way of asking if anyone learning about ANNs at the moment shares this frustration? Or if there's anything else you've found difficult?


  👤 rahimnathwani Accepted Answer ✓
One of the biggest challenges I face is effective note-taking and recalling information that I believe I've 'grasped'. Take for instance, the second video in the 'NN Zero to Hero' series, where I may spend around six hours to fully comprehend the content. This involves pausing, researching additional materials, and rewatching certain sections to enhance my understanding.

While I have a sense of achieving near-total understanding of each component, can I confidently say I've mastered the whole topic? For instance, given unlimited time, could I replicate the functionality of 'makemore' without resorting to any online searches?

The answer, quite likely, is no.

Lately, I've been attempting to integrate Anki into my learning process. It becomes particularly helpful during a second review of a subject, where I can identify crucial information for recall. Additionally, using tools like ChatGPT, I can clear up any misunderstandings before making each flashcard.


👤 Cieric
I haven't gotten super deep into it yet, but https://nnfs.io/ has been good in my opinion. The book slowly replaces written and explained code with numpy equivalents to keep the examples fast. Plus the accompanying animations are also useful. I would be curious what others think on it too.

👤 pizza
You have to be really careful with every single level of the stack.

- metrics code

- batching code

- data parsing/formatting/quality

- data sanitation(avoiding leakage)

- devising a proper dummy baseline early in the design process

- sufficient patience/grit, at a personal level

- data sufficiency

- checking proper gradient flows

- knowledge of the hardware

- knowledge of each framework’s quirks

- knowledge of the probability theory

- knowledge of numerical stability

- ability to setup a complicated stack

- ability to debug both math problems and extremely complicated program stacks

- data normalization

- knowledge of nn architectures

- knowledge of a constantly accelerating milieu of the state of the art

- knowledge of which versions of which libraries/dependencies are buggy

- ability to compare research papers and research code very carefully and critically

- ability to make results reproducible

- ability to share and communicate results

- ability to prioritize beforehand which few of the many possible options you could try, are actually worth your limited time at the moment

- ability to find prior art that has been overlooked by the current popular methods

- ability to optimize code to run fast

- ability to scale up the process to a larger distributed system when required

- ability to unit test stochastic code

- ability to modularize extremely bespoke spaghetti research code once its gotten ugly


👤 MichaelRazum
I think, the models (especially transformers) are kind of straight forward.

But the most difficult part is: - intuition behind them, why they work at all. What a good parameter set is - how to check if training goes well - what to do when things go wrong - how to train large models. I mean gpt4 was trained for a long time. For sure they did some things before and during the long training period.

And so on. I mean because the models are rather „simple“, I think the best engineers get paid pretty high, because it is not that much about the models but everything around it.


👤 vanilla-latte
For me recently, it was learning about backpropagation with batch normalisation.

I realised that I had many gaps understanding what and how to partially derive with vectors and making sure their dimensions aligned with the 'with respect to variable'.

Most of my exposure was just to 'variables/letters' and not vectors. So thinking in dimensions during derivation caught me off guard.

Edit: It took my 3 days on and off to figure it out...


👤 cultofmetatron
> I recently watched Andrej Karpathy's introduction to neural networks

I'm currently interested in delving into neural networks myself and find a lot of the information inaccessible atm. decided to see if I could find this resource you're taking about.

I'm guessing you're referring to this site? https://karpathy.github.io/


👤 jstx1
I don't agree with the sentiment that the resources are limited to academic literature and high-level use of existing tools. There are many many examples of "write your own from scratch", it is has been beaten to death in tutorials, videos and books.

It is indeed useful to learn things that way but it definitely isn't a missing category in any way.


👤 Smith42
I wrote a literature review on applying neural networks to astronomical problems -- I found that using applications really helped to iron out what is going on in the networks! Here's the link to the review https://arxiv.org/abs/2211.03796

👤 bjourne
At my uni the DNN course had us implementing MLPs, CNNs, and RNNs by hand in Octave/Matlab. The course was considered one of the most difficult in the department. But you surely learned a lot from it.

👤 tikkun
Another place you could check for insight into this question would be Reddit’s learn machine learning and machine learning and deep learning subreddits. Good luck with making the resources, sounds useful!

👤 rahimnathwani
For your specific problem (avoiding the use of libraries that do too much of the work), two sources I found useful:

- Data Science from Scratch, by Joel Grus

- The first 5 deeplearning.ai courses (whose exercises relied only on numpy IIRC)