Pattern Recognition and Machine Learning - Bishop
Deep Learning - Goodfellow, Bengio, Courville
Neural Smithing - Reed, Marks
Neural Networks - Haykin
Artificial Intelligence - Haugeland
(there's also "Elements of Statistical Learning" which is a more advanced version)
AI: A Modern Approach - https://aima.cs.berkeley.edu/
You are approaching this like an established natural sciences field where old classics = good. This is not true for ML. ML is developing and evolving quickly.
I suggest taking a look at Kevin Murphy's series for the foundational knowledge. Sutton and Barto for reinforcement learning. Mackay's learning algorithms and information theory book is also excellent.
Kochenderfer's ML series is also excellent if you like control theory and cybernetics
https://algorithmsbook.com/ https://mitpress.mit.edu/9780262039420/algorithms-for-optimi... https://mitpress.mit.edu/9780262029254/decision-making-under...
For applied deep learning texts beyond the basics, I recommend picking up some books/review papers on LLMs, Transformers, GANs. For classic NLP, Jurafsky is the go-to.
Seminal deep learning papers: https://github.com/anubhavshrimal/Machine-Learning-Research-...
Data engineering/science: https://github.com/eugeneyan/applied-ml
For speculation: https://en.m.wikipedia.org/wiki/Possible_Minds
While having strong mathematical foundation is useful, I think developing intuition is even more important. For this, I recommend Andrew Ng's coursera courses first before you dive too deep.
The first chapter walks through a neural network that recognizes handwritten digits implemented in a little over 70 lines of Python and leaves you with a very satisfying basic understanding of how neural networks operate and how they are trained.
Here is how I used that book, starting with a solid foundation in linear algebra and calculus.
Learn statistics before moving on to more complex models (neural networks).
Start by learning ols and logistic regression, cold. Cold means you can implement these models from scratch using only numpy ("I do not understand what I cannot build"). Then try to understand regularization (lasso, ridge, elasticnet), where you will learn about the bias/variance tradeoff, cross-validation and feature selection. These topics are explained well in ESL.
For ols and logistic regression I found it helpful to strike a 50-50 balance between theory (derivations and problems) and practice (coding). For later topics (regularization etc) I found it helpful to tilt towards practice (20/80).
If some part of ESL is unclear, consult the statsmodels source code and docs (top preference) or scikit (second preference, I believe it has rather more boilerplate... "mixin" classes etc). Approach the code with curiosity. Ask questions like "why do they use np.linalg.pinv instead of np.linalg.inv?"
Spend a day or five really understanding covariance matrices and the singular value decomposition (and therefore PCA which will give you a good foundation for other more complicated dimension reduction techniques).
With that foundation, the best way to learn about neural architectures is to code them from scratch. Start with simpler models and work from there. People much smarter than me have illustrated how that can go: https://gist.github.com/karpathy/d4dee566867f8291f086 https://nlp.seas.harvard.edu/2018/04/03/attention.html
While not an AI expert, I feel this path has left me reasonably prepared to understand new developments in AI and to separate hype from reality (which was my principal objective). In certain cases I am even able to identify new developments that are useful in practical applications I actually encounter (mostly using better text embeddings).
Good luck. This is a really fun field to explore!
One of the problem with AI is exactly what you noted above - there are a lot of subcategories and my gut tells me these will grow. For the real neophyte, I'd say start with something that interests you or that you need for work - you likely aren't going to digest all of this in a month and probably no single book will meet all your needs.
I adore PRML, but the scope and depth is overwhelming. LfD encapsulates a number of really core principles in a simple text. The companion course is outstanding and available on EdX.
The tradeoff is that LfD doesn't cover a lot of breath in terms of looking at specific algorithms, but your other texts will do a better job there.
My second recommendation is to read the documentation for Scikit.Learn. It's amazingly instructive and a practical guide to doing ML in practice.
In the opening chapter Jaynes describes a hypothetical system he calls “The Robot”. He then lays out the mathematics of the “The Robot’s” thinking in detail: essentially Bayesian probability theory. This is the best summary of an ideal ML/AI system I’ve come across. It’s also very philosophically enlightening.
Probabilistic Machine Learning: An Introduction
https://probml.github.io/pml-book/book1.html
Probabilistic Machine Learning: Advanced Topics
Let me ask a slightly different way - can someone like me get into a job like these, without needing some more college?
My day job is wrapping up OS templates for people with ML software and I always wonder what they get to go do with them once they turn into a compute instance.
→ Harrison Kinsley, Daniel Kukiela, Neural Networks from Scratch, https://nnfs.io, https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0Qu...
Somewhat foundational, if not in actuality, then in the intention to actually build a theory as in theory of gravitation, although not necessarily an introductory text:
→ Daniel A. Roberts, Sho Yaida, The Principles of Deep Learning Theory, https://arxiv.org/abs/2106.10165
- For deep learning specifically, a more applied text that is beautifully written and chock full of examples is Francois Chollet's Deep Learning with Python (there a new second edition out with up to date examples using modern versions of Tensorflow). The first 3 chapters I would give as required reading for anyone interested in understanding some deep learning fundamentals.
- Deep Learning - goodfellow and bengio - seems like it would be hard to get through without a reading group not exactly a APUE or K&R type reading experience but I haven't spent enough time with it.
If you haven't taken a Linear Algebra or Differential Equations class its useful stuff to know for ML/DL theory but not fully necessary to do applied work with modern high level libraries, but definitely having a strong understanding of basic matrix math is useful.
If you have interests in natural language processing theres a couple good books:
- Natural Language Processing with Python - Bird Klein, Loper, is a great intro to NLP concepts and working with NLTK which may be a bit dated to some but I would definitely recommend, and its online for free. Great examples.(https://www.nltk.org/book/)
- Speech and Language Processing - Dan Jurafsky and James H. Martin - is good, though I have only spent much time with the pre-print
And then theres a lot of papers that are good reads. Let me know if you have any questions or want a list of good papers.
If you just want to get off the ground and start playing with stuff and building things I'd recommend fast.ai's free online course - its pretty high level and a lot is abstracted away but its a great start and can enable you to build lots of cool things pretty rapidly. Andrew Ng's online course also is quite requitable and will probably give you a bit more background and fundamentals.
If I were to choose one book from the bunch it would be Chollet it gives you pretty much all the building blocks you need to be able to read some papers and try to implement things yourself and I find building things a much more satisfying way to learn than sitting down and writing proofs or just taking notes but thats just my preference.
Artificial Intelligence, a modern approach – Stuart Russell, Peter Norvig
The book assumes limited knowledge (similar to what is required for Pattern Recognition I would say) and gives a good intuition on foundational principles of machine learning (bias/variance tradeoff) before delving to more recent research problems. Part I is great if you simply want to know what are the core tenets of learning theory!
But is also available online as a preprint here: https://mlstory.org/
Then start with ISLR.
Then go and watch Andrew Ng Machine Learning course on Coursera (a new version was added in 2022 that uses Python).
Then read the sklearn book from its maintainers/core devs. It's from O'Reilly.
Then go do the Deep Learning Specialization from deeplearning.ai.
Then do fast.ai course.
If interested in Deep RL, watch David Silver lectures, then read Deep RL in Action by Zai, Brown. Then do the HF course on Deep RL.
This is how you get started. Choose your books based on your personality, needs, and contents covered.
And among MOOCs, I highly suggest the one by Canziani, LeCun from NYU. (I loved the 2020 version.)
The one taught by Fei Fei Li and Andrej Karpathy is nice.
These two MOOCs can substitute classic books based on quality.
I have never read cover to cover any of the famous books. I read a lot from them sticking to specific subjects.
Get to reading papers, finding implementations. Ng + ISLR will give you good grounds. Fast.ai + deeplearning.ai will give you capability to solve real problems. NYU + Tubingen + Stanford + UMich (Justin Johnson) courses will bring you to the edge.
You need a lot of practical experience that aren’t taught anywhere. So, get your hands dirty early. Learn to use frameworks, cloud platforms, etc.
Then start reading papers.
A crystal clear grasp on Math foundations is a must. Get it if you don't have already.
Now I think you've got key parts. There's how to use recent production ready models/systems, how to train them and how to make them. Is it in a research or business context?
The field is also broad enough that any one section (text, images, probably symbols) and subsection (time series, bulk, fast online work) all have significant bodies of work behind them. My splits here will not be the best currently so I'm happy for any corrections on a useful hierarchy by the way.
Perhaps you're interested in the history and what's led up to today's work? That's more of a "brief history of time" style coverage, but illuminating.
I'm aware I've not helpfully answered, but I think the same question could have very different valid goals and wanted to bring that to the fore.
https://www.cs.cornell.edu/jeh/book%20no%20so;utions%20March...
Also in published form from Cambridge University Press:
https://www.cambridge.org/core/books/foundations-of-data-sci...
For example nearly everyone understands how to apply multivariable logistic regression, in say Numpy, however a good grasp of underlying concepts such as confidence bounds for overfitting and and being able to use formal proofs to explain concepts such as VC Generalisation will both help you stand out and provide a good foundation that makes further learning much easier.
Understanding Machine Learning: From Theory To Algorithms – Shai Shalev-Shwartz
I have not applied this technique to AI/ML/NN specifically, but it has been useful for me when trying to learn other topics.
The authors are working on a new course that’ll dive deep into the modern Stable Diffusion stuff too, which I’m looking forward to.
This is a good overview of the history of the field (up to SVMs and before deep NNs). I found this useful for putting all the different approaches into context.
I’m having trouble keeping my motivation up but I really want to get up to speed on how LLM’s work and someday make a career switch.
You'd need the following background:
- Linear Algebra
- Multivariate Calculus
- Probability theory && Statistics
Then you need a decent ML book to get the foundations of ML, you can't go wrong with either of these:
- Bishop's Pattern Recognition
- Murphy's Probabilistic ML
- Elements of statistical learning
- Learning from data
You can supplement Murphy's with the advanced book. Elements is a pretty tough book, consider going through "Introduction to statistical learning"[1]. Bishop and Murphy include foundational topics in mathematics.
LfD is a great introductory book and covers one of the most important aspects of ML, that is, model complexity and families of models. It can be supplemented with any of the other books.
I'd also recommend doing some abstract algebra, but it's not a prerequisite.
If you would like a top-down approach, I recommend getting the book "Mathematics of Machine Learning" and learning as needed.
For NN methods, some recommendations:
- https://paperswithcode.com/methods/category/regularization
- https://paperswithcode.com/methods/category/stochastic-optim...
- https://paperswithcode.com/methods/category/attention-mechan...
- https://paperswithcode.com/paper/auto-encoding-variational-b...
For something a little bit different but worth reading given that you have the prerequisite mathematical maturity
- Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges | https://arxiv.org/abs/2104.13478
[1] https://www.statlearning.com/
Many thanks to the user "mindcrime" for catching my error with Introduction to statistical learning.