HACKER Q&A
📣 mfrieswyk

What are the foundational texts for learning about AI/ML/NN?


I've picked up the following, just wondering what everyone's thoughts are on the best books for a strong foundation:

Pattern Recognition and Machine Learning - Bishop

Deep Learning - Goodfellow, Bengio, Courville

Neural Smithing - Reed, Marks

Neural Networks - Haykin

Artificial Intelligence - Haugeland


  👤 softwaredoug Accepted Answer ✓
"Introduction to Statistical Learning" - https://www.statlearning.com/

(there's also "Elements of Statistical Learning" which is a more advanced version)

AI: A Modern Approach - https://aima.cs.berkeley.edu/


👤 KRAKRISMOTT
Haugeland is GOFAI/cognitive science, not directly relevant to modern machine learning variety of models unless you are doing reinforcement learning or trees stuff (hey poker/chess/Go bots are pretty cool!). Russel and Norvig are the typical introductory textbooks for those. Marks and Haykins are all severely out of date (they have solid content, but they don't have the same scale of modern deep learning which has many emergent properties).

You are approaching this like an established natural sciences field where old classics = good. This is not true for ML. ML is developing and evolving quickly.

I suggest taking a look at Kevin Murphy's series for the foundational knowledge. Sutton and Barto for reinforcement learning. Mackay's learning algorithms and information theory book is also excellent.

Kochenderfer's ML series is also excellent if you like control theory and cybernetics

https://algorithmsbook.com/ https://mitpress.mit.edu/9780262039420/algorithms-for-optimi... https://mitpress.mit.edu/9780262029254/decision-making-under...

For applied deep learning texts beyond the basics, I recommend picking up some books/review papers on LLMs, Transformers, GANs. For classic NLP, Jurafsky is the go-to.

Seminal deep learning papers: https://github.com/anubhavshrimal/Machine-Learning-Research-...

Data engineering/science: https://github.com/eugeneyan/applied-ml

For speculation: https://en.m.wikipedia.org/wiki/Possible_Minds


👤 TaupeRanger
There are none anymore. We now know that throwing a bunch of bits into the linear algebra meat grinder gets you endless high quality art and decent linguistic functionality. The architecture of these systems takes maybe a week to deeply understand, or maybe a month for a beginner. That's really it. Everything else is obsolete or no longer applicable unless you're interested in theoretical research on alternatives to the current paradigm.

👤 raz32dust
I personally consider Linear algebra to be foundational in AI/ML. Intro to Linear algebra, Gilbert Strang. And his free course on MIT OCW is fantastic too.

While having strong mathematical foundation is useful, I think developing intuition is even more important. For this, I recommend Andrew Ng's coursera courses first before you dive too deep.


👤 crosen99
"Neural Networks and Deep Learning", by Michael Nielsen http://neuralnetworksanddeeplearning.com (full text)

The first chapter walks through a neural network that recognizes handwritten digits implemented in a little over 70 lines of Python and leaves you with a very satisfying basic understanding of how neural networks operate and how they are trained.


👤 conjectureproof
+1 on Elements of Statistical Learning.

Here is how I used that book, starting with a solid foundation in linear algebra and calculus.

Learn statistics before moving on to more complex models (neural networks).

Start by learning ols and logistic regression, cold. Cold means you can implement these models from scratch using only numpy ("I do not understand what I cannot build"). Then try to understand regularization (lasso, ridge, elasticnet), where you will learn about the bias/variance tradeoff, cross-validation and feature selection. These topics are explained well in ESL.

For ols and logistic regression I found it helpful to strike a 50-50 balance between theory (derivations and problems) and practice (coding). For later topics (regularization etc) I found it helpful to tilt towards practice (20/80).

If some part of ESL is unclear, consult the statsmodels source code and docs (top preference) or scikit (second preference, I believe it has rather more boilerplate... "mixin" classes etc). Approach the code with curiosity. Ask questions like "why do they use np.linalg.pinv instead of np.linalg.inv?"

Spend a day or five really understanding covariance matrices and the singular value decomposition (and therefore PCA which will give you a good foundation for other more complicated dimension reduction techniques).

With that foundation, the best way to learn about neural architectures is to code them from scratch. Start with simpler models and work from there. People much smarter than me have illustrated how that can go: https://gist.github.com/karpathy/d4dee566867f8291f086 https://nlp.seas.harvard.edu/2018/04/03/attention.html

While not an AI expert, I feel this path has left me reasonably prepared to understand new developments in AI and to separate hype from reality (which was my principal objective). In certain cases I am even able to identify new developments that are useful in practical applications I actually encounter (mostly using better text embeddings).

Good luck. This is a really fun field to explore!


👤 poulsbohemian
I'm sitting ten feet from my copy of Artificial Intelligence, a modern approach – Stuart Russell, Peter Norvig. While I will say it has a lot of still worthwhile basic information, I really wouldn't recommend it. It's an enormous book, so physically difficult to read, but also the bulk of the content is somewhere between dated and terse. I went through school and studied AI ten years before it was written, and I'm glad I didn't use it as an undergrad textbook - would have been overwhelming.

One of the problem with AI is exactly what you noted above - there are a lot of subcategories and my gut tells me these will grow. For the real neophyte, I'd say start with something that interests you or that you need for work - you likely aren't going to digest all of this in a month and probably no single book will meet all your needs.


👤 bradreaves2
This is off the beaten path, but consider Abu-Mostafa et al.'s "Learning from Data". https://www.amazon.com/Learning-Data-Yaser-S-Abu-Mostafa/dp/...

I adore PRML, but the scope and depth is overwhelming. LfD encapsulates a number of really core principles in a simple text. The companion course is outstanding and available on EdX.

The tradeoff is that LfD doesn't cover a lot of breath in terms of looking at specific algorithms, but your other texts will do a better job there.

My second recommendation is to read the documentation for Scikit.Learn. It's amazingly instructive and a practical guide to doing ML in practice.


👤 bjornsing
It’s probably a bit off the beaten path, but I can highly recommend Probability Theory, The Logic of Science, by E. T. Jaynes.

In the opening chapter Jaynes describes a hypothetical system he calls “The Robot”. He then lays out the mathematics of the “The Robot’s” thinking in detail: essentially Bayesian probability theory. This is the best summary of an ideal ML/AI system I’ve come across. It’s also very philosophically enlightening.


👤 gerash
I'd suggest these two by Kevin Murphy:

Probabilistic Machine Learning: An Introduction

https://probml.github.io/pml-book/book1.html

Probabilistic Machine Learning: Advanced Topics

https://probml.github.io/pml-book/book2.html


👤 junkerm
In read parts of Murphys "Probabilistic Maschine Laearning" (vol 1) which is an update of an existing book in ML. It covers a broad range of topics also very recent developments. It also includes foundation topics such as probability, linear algebra, optimization. Also it is quite aligned with the Goodfellow book. I found it quite challenging at certain points. What helped a lot was to read a book on bayesian statistics. I used Think Bayes by Allen Downey for that (http://allendowney.github.io/ThinkBayes2/index.html)

👤 daturkel
I maintain a list of well-known or foundational papers in ML in a github repo that may be of interest to readers of this thread

https://github.com/daturkel/learning-papers


👤 digitalsushi
Are there obvious paths into these spaces for someone stuck over in devops/infrastructure/platform engineering? Or is it too far a hop to really find a direct path in?

Let me ask a slightly different way - can someone like me get into a job like these, without needing some more college?

My day job is wrapping up OS templates for people with ML software and I always wonder what they get to go do with them once they turn into a compute instance.


👤 ly3xqhl8g9
Not sure if foundational (quite a tall order in such a fast-moving field), but for sure a nice introduction into neural networks, and even mathematics in general (for a teenager, because it's nice to see numbers in action beyond school-level algebra):

→ Harrison Kinsley, Daniel Kukiela, Neural Networks from Scratch, https://nnfs.io, https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0Qu...

Somewhat foundational, if not in actuality, then in the intention to actually build a theory as in theory of gravitation, although not necessarily an introductory text:

→ Daniel A. Roberts, Sho Yaida, The Principles of Deep Learning Theory, https://arxiv.org/abs/2106.10165


👤 avipeltz
- AIMA by Russel and Norvig is a classic but I would say is more of overview of the field and for most topic areas isn't quite deep enough imo.

- For deep learning specifically, a more applied text that is beautifully written and chock full of examples is Francois Chollet's Deep Learning with Python (there a new second edition out with up to date examples using modern versions of Tensorflow). The first 3 chapters I would give as required reading for anyone interested in understanding some deep learning fundamentals.

- Deep Learning - goodfellow and bengio - seems like it would be hard to get through without a reading group not exactly a APUE or K&R type reading experience but I haven't spent enough time with it.

If you haven't taken a Linear Algebra or Differential Equations class its useful stuff to know for ML/DL theory but not fully necessary to do applied work with modern high level libraries, but definitely having a strong understanding of basic matrix math is useful.

If you have interests in natural language processing theres a couple good books:

- Natural Language Processing with Python - Bird Klein, Loper, is a great intro to NLP concepts and working with NLTK which may be a bit dated to some but I would definitely recommend, and its online for free. Great examples.(https://www.nltk.org/book/)

- Speech and Language Processing - Dan Jurafsky and James H. Martin - is good, though I have only spent much time with the pre-print

And then theres a lot of papers that are good reads. Let me know if you have any questions or want a list of good papers.

If you just want to get off the ground and start playing with stuff and building things I'd recommend fast.ai's free online course - its pretty high level and a lot is abstracted away but its a great start and can enable you to build lots of cool things pretty rapidly. Andrew Ng's online course also is quite requitable and will probably give you a bit more background and fundamentals.

If I were to choose one book from the bunch it would be Chollet it gives you pretty much all the building blocks you need to be able to read some papers and try to implement things yourself and I find building things a much more satisfying way to learn than sitting down and writing proofs or just taking notes but thats just my preference.


👤 dezzeus
You may want to also consider this one:

Artificial Intelligence, a modern approach – Stuart Russell, Peter Norvig


👤 gaspb
If you're more inclined to theory, I would suggest "Learning Theory from First Principles" by F. Bach: https://www.di.ens.fr/~fbach/ltfp_book.pdf

The book assumes limited knowledge (similar to what is required for Pattern Recognition I would say) and gives a good intuition on foundational principles of machine learning (bias/variance tradeoff) before delving to more recent research problems. Part I is great if you simply want to know what are the core tenets of learning theory!


👤 stevenbedrick
To add to the great recommendations on this thread, I really like Moritz Hardt and Benjamin Recht's "Patterns, Predictions, and Actions". It's published by Princeton University Press here: https://press.princeton.edu/books/hardcover/9780691233734/pa...

But is also available online as a preprint here: https://mlstory.org/


👤 master_yoda_1

👤 rg111
Do you have Linear Algebra knowledge, and Stats 101 knowledge?

Then start with ISLR.

Then go and watch Andrew Ng Machine Learning course on Coursera (a new version was added in 2022 that uses Python).

Then read the sklearn book from its maintainers/core devs. It's from O'Reilly.

Then go do the Deep Learning Specialization from deeplearning.ai.

Then do fast.ai course.

If interested in Deep RL, watch David Silver lectures, then read Deep RL in Action by Zai, Brown. Then do the HF course on Deep RL.

This is how you get started. Choose your books based on your personality, needs, and contents covered.

And among MOOCs, I highly suggest the one by Canziani, LeCun from NYU. (I loved the 2020 version.)

The one taught by Fei Fei Li and Andrej Karpathy is nice.

These two MOOCs can substitute classic books based on quality.

I have never read cover to cover any of the famous books. I read a lot from them sticking to specific subjects.

Get to reading papers, finding implementations. Ng + ISLR will give you good grounds. Fast.ai + deeplearning.ai will give you capability to solve real problems. NYU + Tubingen + Stanford + UMich (Justin Johnson) courses will bring you to the edge.

You need a lot of practical experience that aren’t taught anywhere. So, get your hands dirty early. Learn to use frameworks, cloud platforms, etc.

Then start reading papers.

A crystal clear grasp on Math foundations is a must. Get it if you don't have already.


👤 pkoird
AIMA by Russel and Norvig is a must read IMO.

👤 ipnon
I’d posit we don’t understand AIML enough to know their foundations with much certainty. Take for example the discovery of emergent zero-shot properties in the latest LLMs. My recommendation to a beginner would be to grok gradient descent, matrix multiplication, and the universal approximation theorem, then get on to engineering like the rest of us. You can’t go wrong with Jeremy Howard’s FastAI course and his “Deep Learning for Coders.”

👤 IanCal
I think a good start is to think about what you want to do. "Back in my day" ai was mostly academic and had more classic foundational parts with newer flashy bits. It wasn't, broadly, applicable to the real world. Some parts but not a huge amount.

Now I think you've got key parts. There's how to use recent production ready models/systems, how to train them and how to make them. Is it in a research or business context?

The field is also broad enough that any one section (text, images, probably symbols) and subsection (time series, bulk, fast online work) all have significant bodies of work behind them. My splits here will not be the best currently so I'm happy for any corrections on a useful hierarchy by the way.

Perhaps you're interested in the history and what's led up to today's work? That's more of a "brief history of time" style coverage, but illuminating.

I'm aware I've not helpfully answered, but I think the same question could have very different valid goals and wanted to bring that to the fore.


👤 robg
Coming from cognitive neuroscience surprised that Explorations in Parallel Distributed Processing by McClelland and Rumelhart doesn’t get more attention as a classic in bridging old school AI approaches with the modern paradigm.

https://psycnet.apa.org/record/1988-97441-000


👤 throwaway81523
Blum, Hopcroft, and Kannan, Foundations of Data Science looks good:

https://www.cs.cornell.edu/jeh/book%20no%20so;utions%20March...

Also in published form from Cambridge University Press:

https://www.cambridge.org/core/books/foundations-of-data-sci...


👤 jgrimm
Learning From Data (https://amlbook.com) is a great introduction to ML from a more theoretical perspective. The language is easy to understand but the concepts that it deals with are very theoretical, a combination that is hard to find elsewhere.

For example nearly everyone understands how to apply multivariable logistic regression, in say Numpy, however a good grasp of underlying concepts such as confidence bounds for overfitting and and being able to use formal proofs to explain concepts such as VC Generalisation will both help you stand out and provide a good foundation that makes further learning much easier.


👤 cscurmudgeon
Get a strong grasp on Linear Algebra and everything else falls into place more easily

https://math.mit.edu/~gs/learningfromdata/



👤 adg001
I have not seen mentioned so far in this thread the following book, which I can't recommend more highly:

Understanding Machine Learning: From Theory To Algorithms – Shai Shalev-Shwartz


👤 dmarcos
I remember Carmack mentioning in a podcast a list of seminal papers that Ilya Sutskever (@ilyasut) gave to him to learn AI foundations. I would love to see that list.

👤 zffr
You may also want to consider reading through some of the important (or highly cited) academic papers in AI/ML/NN. From these papers you may get a sense of the techniques researchers are using, and which topics are most important to learn.

I have not applied this technique to AI/ML/NN specifically, but it has been useful for me when trying to learn other topics.


👤 epgui
The foundations of AI/ML are really linear algebra and statistics. But not the kinds of stats most people learn in undergrad: focus on linear models (there are tons of great books on just that; also look up “common statistical tests are linear models” for a great intro into what i’d call useful stats), bayesian stats, anova/manova/permanova, etc.

👤 dceddia
I’m a big fan of learning through practice vs learning all the theory up front, and for anyone else who feels the same, the Fast AI course and book are very good: https://fast.ai

The authors are working on a new course that’ll dive deep into the modern Stable Diffusion stuff too, which I’m looking forward to.


👤 cttet
I will recommend Information Theory, Inference, and Learning Algorithms by David MacKay. If you really want to understand the "learning" part, rather than being given a methodology without knowing why or in the unsorted bounds that "guarantee" abstract things that may mismatch reality.

👤 alphabetting
For a less technical history of the field and major players I'd recommend Genius Makers.

👤 6gvONxR4sf7o
Kevin Murphy’s books (especially the new ones) are what I’d point anyone towards for ML.

👤 davidhunter
The Quest for Artificial Intelligence: A History of Ideas and Achievements Nils J. Nilsson

This is a good overview of the history of the field (up to SVMs and before deep NNs). I found this useful for putting all the different approaches into context.


👤 bilsbie
If anyone is just starting and out wanting to do a study group let me know.

I’m having trouble keeping my motivation up but I really want to get up to speed on how LLM’s work and someday make a career switch.


👤 PartiallyTyped
I recommend against DL by Goodfellow. At this point it is pretty much outdated. Actually, anything specific to NNs is already outdated by release.

You'd need the following background:

- Linear Algebra

- Multivariate Calculus

- Probability theory && Statistics

Then you need a decent ML book to get the foundations of ML, you can't go wrong with either of these:

- Bishop's Pattern Recognition

- Murphy's Probabilistic ML

- Elements of statistical learning

- Learning from data

You can supplement Murphy's with the advanced book. Elements is a pretty tough book, consider going through "Introduction to statistical learning"[1]. Bishop and Murphy include foundational topics in mathematics.

LfD is a great introductory book and covers one of the most important aspects of ML, that is, model complexity and families of models. It can be supplemented with any of the other books.

I'd also recommend doing some abstract algebra, but it's not a prerequisite.

If you would like a top-down approach, I recommend getting the book "Mathematics of Machine Learning" and learning as needed.

For NN methods, some recommendations:

- https://paperswithcode.com/methods/category/regularization

- https://paperswithcode.com/methods/category/stochastic-optim...

- https://paperswithcode.com/methods/category/attention-mechan...

- https://paperswithcode.com/paper/auto-encoding-variational-b...

For something a little bit different but worth reading given that you have the prerequisite mathematical maturity

- Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges | https://arxiv.org/abs/2104.13478

[1] https://www.statlearning.com/

Many thanks to the user "mindcrime" for catching my error with Introduction to statistical learning.


👤 nephanth
At some points, UC Berkeley's course videos were available on the web, and they had a pretty good AI course

👤 jpamata
ISLR for foundation and passing interviews. It also has lectures in youtube, just type ISLR lectures.

👤 revskill
Without a running code, it's hard to grasp concepts. So i prefer texts with code.