ideally, a list of papers that could take 2-5 hours each and a few hundred lines of code?
I seem to be in a similar situation as an experienced software engineer who has jumped into the deep end of ML. It seems most resources either abstract away too much detail or too little. For example, building a toy example that just calls gensim.word2vec doesn't help me transfer that knowledge to other use cases. Yet on the other extreme, most research papers are impenetrable walls of math that obscure the forest for the trees.
Thus far, I would also recommend Andrej Karpathy's Zero to Hero course (https://karpathy.ai/zero-to-hero.html). He assumes a high level of programming knowledge but demystifies the ML side.
--
P.S. If anyone is, by chance, interested in helping chip away at the literacy crisis (e.g., 40% of US 4th graders can't read even at a basic level), I would love to find a collaborator for evaluating the practical application of results from the ML fields of cognitive modeling and machine teaching. These seemingly simple ML models offer powerful insight into the neural basis for learning but are explained in the most obtuse ways.
Some papers that are runnable on a laptop CPU (so long as you stick to small image sizes/tasks):
1) Generative Adversarial Networks (https://arxiv.org/abs/1406.2661). Good practice to have a custom training loops, different optimisers and networks etc.
2) Neural Style Transfer (https://arxiv.org/abs/1508.06576). Nice to be able to manipulate pretrained networks and intercept intermediate layers.
3) Deep Image Prior (https://arxiv.org/abs/1711.10925). Nice low-data exercise in building out an autoencoder.
4) Physics Informed Neural Networks (https://arxiv.org/abs/1711.10561). If you're interested scientific applications, this might be fun. It's good exercise in calculating higher order derivatives of neural networks and using these in loss functions.
5) Vanilla Policy Gradient (https://arxiv.org/abs/1604.06778) is the easiest reinforcement learning algorithm to implement and can be used as a black-box optimiser in a lot of settings.
6) Deep Q Learning (https://arxiv.org/abs/1312.5602) is also not too hard to implement and was the first time I had heard about DeepMind, as well as being a foundational deep reinforcement learning paper .
Open AI gym (https://github.com/openai/gym) would help get started with the latter two.
When you do decide on a paper, take a look at Phil Wang's implementation style: https://github.com/lucidrains?tab=repositories, he has hundreds of papers implemented.
If you don't already have a GPU machine, you can rent 40GB A100 instance for $1.1/hr or 24GB A10 for $0.6/hr: https://lambdalabs.com/service/gpu-cloud.
I think you are severely underestimating the time required, unless you are quite experienced, know exactly what to look for, or the paper is just a slight variation on previous work that you are already familiar with.
Even seasoned researchers can easily spend 30+ hours on trying to reproduce a paper, because papers almost never contain all the details that went into the experiments. You are left with a lot of fiddling and iteration. Of course, if you only care about roughly reproducing what the authors did, and don't care about getting the same results, the time can be much shorter. If the code is available that's even better, but looking at it is cheating since wrestling with issues yourself is a big part of the learning process.
A few people here mentioned Andrej's lectures, and I also think they are amazing, but they are not a replacement for getting stuck and solving problems yourself. You can easily watch these lectures and think "I get it!" because everything is so well explained, but you'll probably still be stuck when you run into your own problems trying to reproduce papers from scratch. There's no replacement for the experience you gain by struggling :)
It's like watching a math lecture and thinking you get it, but then getting stuck at the exercise problems. The real learning happens when you force yourself to struggle through the exercises.
- devdocs.io for pytorch
- conda for packaging w CUDA
- einops
- tensorboard
- huggingface datasets
Interesting models/structures:
- resnet
- unet
- transformers
- convnext
- vision transformers
- ddpm
- novel optimizers
- generative flow nets
- “your classifier is secretly an energy-based model and yiu should treat it like one” paper
- self-supervision
- distance metric learning
Places where you can read implementations:
- lucidrains’ github
- timm computer vision models library
- fastai
- labml (annotated quite nicely)
Biggest foreseeable headaches:
- not suuper easy to do test-driven development
- data normalization (floating point error, not using eg batchnorm)
- sensitivity of model performance to (hyper)params (layer sizes, learning rates, optimizer, etc)
- impatience
- lack of data
I’d also recommend watching Mark Saroufim live code in PyTorch, on YouTube. My 2 cents, you can only get really fast as well as good at this with a lot of experience. A lot of rules-of-thumb have to come together just right for the whole system to work.
An initial implementation might be doable in 5 hours for someone competent and familiar with RLLib's APIs, but could take much longer to really polish.
Knowing how to step through backpropagation in a neural network gets you pretty far in conceptual understanding of a lot of architectures. Imo there’s no substitute for writing out the gradients by hand to make sure you get what’s going on, if only in a toy example.
Starting out I would recommend implementing fundamental building blocks within whatever 'subculture' of ML you are interested in whether that be DL, kernel methods, probabilistic models, etc.
Let's say you are interested in deep learning methods (as that's something I could at least speak more confidently about). In that case build yourself an MLP layer, then an RNN layer, then a GNN layer, then a CNN layer, and an attention layer along with some full models with those layers on some case studies exhibiting different data modalities (images, graphs, signals). This should give you a feel for the assumptions driving the inductive biases in each layer and what motivates their existence (vs. an MLP). It also gives you the all the building blocks you can then extend to build every other DL layer+model out there. Another reason is that these fundamental building blocks have been implemented many times so you have a reference to look to when you get stuck.
On that note: here are some fun GNN papers to implement in order of increasing difficulty (try building using vanilla PyTorch/Jax instead of PyG). - SGC (from https://arxiv.org/abs/1902.07153) - GCN (from https://arxiv.org/abs/1609.02907) - GAT (from https://arxiv.org/abs/1710.10903)
After building the basic building blocks these should each take about 2-5 hours (reading paper + implementation). Probably quicker at the end with all this practice. Good luck and remember to have fun!
https://www.coursera.org/specializations/deep-learning
The lectures take you through each major paper, then you implement the paper in the homework. Much faster than reading the paper yourself.