Standing in 2022, what are the best resources for a CS student/decent programmer to get into the field of ML and DL on their own. Resources can be both books or public courses.
The target ability:
1. To understand the theory behind the algorithms
2. To implement an algorithm on a dataset of choice. (Data cleaning and management should also be learned)
3. Read research publications and try to implement them.
1) Get some statistics/probability basics. It's full of people (you can see a lot of analyses on Kaggle) that "do machine learning" but make very silly mistakes (e.g. turn categorical data into a float and use it as a continuous variable when training a model).
2) take a look at traditional machine learning approaches. Nowadays you're swamped by DL (a lot of good suggestions on this thread, I won't chime in), and you miss the fact that, sometimes, a simple decision tree, or dimensionality reduction approaches (e.g. PCA or ICA) can yield an incredible value in a very short time on huge datasets.
I had written a fairly short post about it when I finished my georgia tech path https://www.franzoni.eu/machine-learning-a-sound-primer/
3) It can take a lot of time to become effective in ML, effective as in, what you _manually create_ is as effective as picking an existing trained model, fine tune it, and use it. This can be frustrating: low hanging fruits are pretty powerful and you don't need to understand a lot about ML algorithms to pick them up.
4) Consider MOOCs or online classes. I took Georgia Tech OMSCS, I can vouch for it and some classes force you to be a data scientist and read papers as well, and you can have "real world" recognition and discuss with your peers, which is useful!
1. ML/DL Researcher
2. Data scientist - 20/80 engineering vs modelling
3. ML Eng - 50/50 (or 70/30) engineering vs modelling
People suggesting working in engineering to support ML are right that there's a lot of demand, but it's not what you're asking for.
Becoming an ML/DL researcher working on novel techniques or new models will be hard without academic research experience. Few companies are big enough to support true research, and the ones that are have a very high bar even for people with PHDs
What I call "data scientists" apply math/ML to real problems. The people I see here have a quantitative background like physics/math/CS. Often they have more general quantitative skills that go beyond ML. People like this will might work on things like fraud where an eng pipeline exists and small improvements in the model are valuable.
There are more of these roles than "true research" and they exist at small companies because it's applied. You can get into this with demonstrated evidence in side projects + a convincing background, but professional education might be the most sure way.
Finally - there's a lot of demand for engineers who can do both modeling and the requisite engineering. A model is a small part of what goes into a production ML feature - you need a data pipeline, automated retraining/prediction, a place to deploy the model, monitoring on eng stats + data stats, and the usual application backend/frontend to do something with the results.
You might be able to get into this with some demonstrated experience in side projects assuming you're a SWE already, and depending on your standards for where you want to work.
Try the Data/ML Engineer route. Instead of going directly into ML, try to work as a “supporter” of those doing ML. There’s a HUGE gap there, specially if you’re a good programmer.
There are a lot of people in the “pure” ML space, people with science background, with phDs, etc.
But there’s not enough people to support them: taking their models to producing, building their pipelines, etc.
If you get into Data/ML engineer, you’ll be working with these people and learning from them.
It’s a longer route for sure, but I think it can yield the highest success rate.
Do what the instructor recommends: watch each lesson once in its entirety and then re-watch it while playing along. But don't just type their commands verbatim. Try and do something slightly different.
Do you want to:
a) Become an academic in mathematics/statistics.
b) Become an academic in computer science with a focus on artificial intelligence.
c) Become a MLE in "regular" statistical applications. Aka bayesian classification, "core" statistical principles.
d) Become a specialized computer vision/natural language processing focused MLE.
e) Become a generalist software engineer who can whip out the above if needed.
In no way is e) the inferior option.
Generalists who can write code fast with 100% test coverage and pristine logging are by far the segment the industry has the shortest supply of.
There are TONS of math guys. Vanishingly few Principal Engineers who can write a design document and lead a project.
(Machine learning customers are OBSESSED with test coverage and verifiability. Believe it or not, multinational corporations generally don't want to unleash a {your_adjective_here}ist algorithm on the world.)
2. Study the above, properly.
To study the math, Elements of Statistical Learning/Algorithms by Goodfellow.
Start on page 1, do every second exercise. Publish a summary of every chapter you finish with your answers to GitHub.
3. Pursue your goal in a publicly verifiable manner.
See:
> ML/DL research
I think you should apply ML deeply to a domain you care about, but see if you can find a domain that can be both generative as well as for understanding. If you are heavy into the math and don't need a grounding basis, maybe you don't need a domain to apply the ML research to, but the best scientists had a problem they were trying to solve, not just "doing research". Basically research in strong direction, for strong purpose solving a problem.
I guessed you asked a low level mechanical question. How do I get from A to B. You might already have the domain.
So to answer the actual question, I'd pick something like MNIST (digit recognition problem) and master it by hand from scratch using multiple techniques, as many techniques as I could find. So that I am applying each algorithm to a fixed problem, so that the algorithm and then later a paper the algorithm gets embedded in my mind.
Use only cleaned datasets, spend zero energy on those a the beginning. Cleaning is a separate job and two different things don't need to be learned here. In fact stick with only industry benchmark data so you can compare your results to more papers.
1. Elements of Statistical Learning (very frequentist treatment) by Hastie et.all [1]
2. Pattern Recognition and Machine Learning by Bishop(for a Bayesian treatment)[2]
Both are freely available online. Reading one book will get you to top 5% practitioners and reading both will get you to top 1%
[1] https://hastie.su.domains/Papers/ESLII.pdf
[2] https://www.microsoft.com/en-us/research/uploads/prod/2006/0...
I like Hands-On ML... by Geron as a decent intro to ML book. FastAI seems a bit overrated to me - I didn't like that it uses its own helper library or the teaching style but it obviously works for other people.
Then there's more exhaustive books on theory - Elements of Statistical Learning, Pattern Recognition and Machine Learning, Bayesian Reasoning and Machine Learning, Murphy's books on probabilistic ML etc. But obviously the theory books have a lot of overlap with each other so there will be lots of material to skip after you've read one or two of them.
After completing that, I think Kaggle competitions are a great way to master your skills.
In my opinion, jump straight into this! Learn prerequisites as you need them.
I found Goodfellow's book [1] to be helpful to learn some basics.
But don't think you need to read the whole book before you start reading and implementing research papers.
If you try and build up all the fundamentals thorougly, you run the risk of going down a very deep rabbit hole e.g. learning real analysis so you can learn measure theory so you can learn measure theoretic probability theory so you can learn stats properly etc.
You can be a productive researcher and patch up the fundamentals over time.
Afterwards, do a statistics class. Most algorithms these days are based on softmax, meaning the cross-entropy between two discrete/continuous probability distributions. There's a lot of choice in which distribution to use to model what and it will have strong effects on your gradients and, hence, training trajectory.
Concepts like shannon information and entropy are also very helpful for you to monitor training progress. Typical loss values will do exponential annealing and it'll be difficult to see further progress. But if you still reduce the bits of entropy in your classifier, learning is still going well. So you need to understand what to visualize and how to calculate that.
As for implementing research publications, maybe start with easy mode and go to paperswithcode.com . There, you will find papers AND their source code, so that you can look at how others implemented their paper.
As for FastAI and Kaggle, my personal impression is that it's mostly for toy problems. No real AI researcher would be willing to disclose their full source code to an international megacorp like H&M for a measly $15k in price money, yet similar terms appear to be the default on Kaggle:
https://www.kaggle.com/competitions/h-and-m-personalized-fas...
https://www.kaggle.com/competitions/dfl-bundesliga-data-shoo...
https://www.kaggle.com/competitions/feedback-prize-effective...
EDIT: Also, I strongly disagree with course.fast.ai on these points: "Myth (don’t need): Lots of math, Lots of data, Lots of expensive computers" To train a state of the art ASR AI, you need roughly 100x A100 for a month, 100,000+ hours of audio recordings, and math knowledge to find a maximum likelihood path through a logit matrix. Unless, of course, you're only working on toy problems.
Sure, most of this sounds as dull as a broken clock, but in my observation it makes the difference between students who can just use machine learning tools by copying textbook cases and adopting a lot of fancy new terminology, and those that understand what they're doing.
That difference really kicks in once you get off the beaten track of popular use-cases, into applying ML to new, unproven applications. Then you need a deeper understanding of why some algorithms may be useful and others are inappropriate.
To really understand what is going on though, the path I am having some early success with (as a long time developer / data pipeline guy, but newly into the standard python / ML practice) is to run through Kochenderfer's "Algorithms for optimization" from 2019 (MIT press), including implementing the exercises, as optimization is the cornerstone of the majority of ML methods. Some of the most fun I've had in a long time.
Freely available here:
https://algorithmsbook.com/optimization/
From there on, I'm less sure, but expect I might experiment with implementing my own deep learning methods just for fun, or similar.
I started out without *any* background knowledge a few years ago. Found the Data Scientist career track of Datacamp pretty helpful, since it goes beyond programming and includes the mathematical and statistical theories as well. (https://www.datacamp.com/tracks/data-scientist-with-python)
It's basic, but a solid foundation to build upon.
If you're already familiar with most of these topics, fast.ai is the way to go!
For me that was robotics, with the motivation that traditional method felt like it wouldn't scale outside static environments so I started to look in to ML/DL (Deep Reinforcement Learning really) and from the looks of it I'm not alone. [0]
Now I do research in it, without a PhD nor taking any courses in it at my masters. (except one DL course where we had to code everything including gradient flow from scratch. No framework)
Frankly, going pure DL at research level today seems like a steep uphill; the top labs and research institute(including industries) are the ones that are producing (and notably training) most of the SOTA models. Getting in to those circles are your best bet but then a PhD at a top university under a top professor is the best bet, and competition to get in to those are insane
I would say try fast.ai for a quick taste of what ML/DL is like, and then go back to linear algebra, deep learning and stats courses from top schools while picking a personal project goal to achieve (e.g. reproducing a popular CVPR/ICML paper results or building your own XXX) Once you go through a full lifecycle of building something from scratch, you will have a much better understanding about where you are and wanna go from there.
As someone who's only ever dabbled into minimaxir's GPT-2 packages this was an extremely approachable exploration (and explanation) of how a neural network works. I can't recommend it enough.
2. Run through the catalogue of StatQuest videos for topics of interest in machine learning etc. This includes step-by-step maths and code explainers: https://www.youtube.com/c/joshstarmer/playlists
3. Watch more 3blue1brown videos if you need to step further back to refresh on calculus and linear algebra (that's most of the maths you'll need).
If you're hooked and can't get enough of the above content, then congratulations and welcome to the Matrix.
This will require at least upper undergraduate level math BTW.
2. You could get by knowing the theory in a handwavy way. Not ideal but I've seen people do it. For implementation that is enough in many case.
3. Again, "research" is too general. While you might understand some experimental ICML papers, it's very unlikely you will understand a single COLT paper if you don't know a lot of math.
It would be helpful to know more about your background and motivations.
Are you currently enroled in a Bachelor of Science (B.Sc.) full time stdy program at a university, and your goal is to be a research scientist (either staff scientist or professor or research fellow) in the area of machine learning?
If this is true, does your university offer a Master's program in Machine Learning, or are your grades such that you could apply for such a program elsewhere after completing your first degree? You could then enter a Ph.D. programm in machine learning itself, or in computer science with an applied ML topic such as ML for NLP (Natural Language Processing) or ML for IR (Information Retrieval = search engines) or ML for robotics etc. The choice of doctoral advisor and Ph.D. topic will steer you towards a particular direction, in which you can then find employment to conduct research under the direction of others, and potentially, become a research group leader yourself after gaining the necessary experience. Time: M.Sc.: 1-2 years; Ph.D.: 3-8 years; postdoctoral/pre-tenure time: e.g. 2-k years, depending on ability and luck/timing). It's a lot of fun to get paid for doing science, so I chose that path (but with multiple deviations due to startups and industry jobs along the way).
The more people know, the easier it is to recommend you useful materials.
- Get really comfortable with matplotlib or your graphing library of choice. Plot your data in every way you can think of. Plot your models' outputs, find which samples they do best and worst on.
- Play around with different hyperparameters and data augmentation strategies and see how they affect training.
- Try implementing backprop by hand -- understanding the backward pass of the different layers is extremely helpful when debugging. I found Karpathy's CS231n lectures to be a great starting point for this.
- Eventually, you'll want to start reading papers. The seminal papers (alexnet, resnet, attention is all you need, etc) are a good place to start. I found https://www.youtube.com/c/YannicKilcher (especially the early videos) to be a very useful companion resource for this.
- Once you've read some papers and feel comfortable with the format, you'll want to try implementing something. Important tricks are often hidden away in the appendices, read them carefully!
- And above all, remember that machine learning is a dark art -- when your dataloader has a bug in its shuffling logic, or when your tensor shapes get broadcast incorrectly, your code often won't throw an error, your model will just be slightly worse and you'll never notice. Because of this, 90% of being a good ML researcher/engineer is writing tests and knowing how to track down bugs. http://karpathy.github.io/2019/04/25/recipe/ perfectly summarizes my feelings on this.
This is not an instant-gratification with fancy results kind of course. But put in the work, and you will learn some very cool stuff.
If you want something more theoretical there is a book by Hopcroft et al. that was released in draft form a number of years ago. It appears to be out for real now: Foundations of Data Science, by Avrim Blum, John Hopcroft, and Ravindran Kannan. Blurb and video lectures: https://www.microsoft.com/en-us/research/publication/foundat... I just found these so haven't looked at them yet. The book draft (2014, wow) is here: https://www.cs.cornell.edu/jeh/book11April2014.pdf I didn't stick with it long enough to make much progress, unfortunately.
Kaggle.ai problems are a good set of practical projects even if you're not aiming to be competitive at them (which takes a lot of effort and resources). The Fast.ai vids are ok as preparation for them.
> The target ability:
> 1. To understand the theory behind the algorithms
> 2. To implement an algorithm on a dataset of choice. (Data cleaning and management should also be learned)
> 3. Read research publications and try to implement them.
There are many different ways that people do ML/DL research these days. Some people do more theory-work which will necessarily be more focused on mathematics, and others do more of an applied approach which will be more focused on coding and iterating.
For theory-driven work, I think Michael I Jordans list is still pretty solid:
> https://news.ycombinator.com/item?id=1055389
I would focus on the fundamentals first though:
1. get a solid background in mathematics
- analysis (a suggestion is Baby Rudin)
- probability (Grimmet and Stirzaker, maybe something with measure theory after)
- statistics (Casella and Berger or Wasserman's book is a good start)
2. get a solid foundation in statistical machine learning - Introduction to Statistical Learning is a fantastic start
- Then choose 1 or both of the following:
- Elements of Statistical Learning for a Frequentist Approach
- Pattern Recognition & Machine Learning for a Bayesian Approach
3. get a baseline understanding of deep learning - the deep learning book by Goodfellow is decent
- start reading papers here and trying to implement them
If you get through to this last step, you are probably solid enough to get a job building models. If that's the route you want, then begin iterating on learning about new approaches in papers (look for papers with code / data) and implementing them.If you want to go the academic route, you have enough of a view of the field to begin specializing further. Choose a sub-domain and dig deep if you want to do more deep learning work. Maybe revisit Michael I Jordan's list if you're still confused about where to go. A lot of those books will feel a lot more familiar.
Best of luck!
I would start with math foundations: basic linear algebra, stats, probability and some analysis. CS undergrad level is plenty of math for start.
Then I would try to understand back prop on intimate level: learn how to calculate gradients, maybe take a look on how autograd works as well.
Then you should know a bit to pick your next steps by yourself.
>(Data cleaning and management should also be learned)
There are many students and graduates who either didn't want to do research in the first place or didn't get that research grant or position and looking to get employed in private sector with their degree. Many universities and colleges have now also retooled some of their statistics degrees as dedicated "data science" curriculum who either know basics of ML/DL or have the prerequisite background to learn quickly.
However, in my experience (I am extrapolating from my own past job search experiences) while "understanding theory behind the algorithms" counts still for something, it is much less than one would think. Familiarity with the software technologies and practical implementation is what counts much more. This includes not only "data management", a phrase which makes it sound like the data simply exists somewhere and only needs to be managed (not unlike a Kaggle competition), but also the data pipeline management from generation/collection to analysis and communication of the results, and deploying the software the implements it all, and so on. I suppose (never been on that end of the interview table) given any two candidates to interview, it is very difficult to evaluate how deeply one understands theory of some algorithm compared to other if they both demonstrate some basic understanding (and what is the practical use of possible difference in insight from such differential, anyway?). Likewise, I assume it is somewhat easier to gauge whether someone seems to able start delivering results or contributing to their on-going work quickly if they have the relevant technical skills and/or domain knowledge.
You should use this video's title as a compass even though it says "machine learning roadmap." Investigate it, follow your interest, pick up a new skill, and then put what you've learned to use when determining your next moves. Investigate it, follow your interest, pick up a new skill, and then put what you've learned to use when determining your next moves.
Video https://www.youtube.com/watch?v=pHiMN_gy9mk. Interactive Machine Learning Roadmap - https://dbourke.link/mlmapttps://www.charliewalks.com to read it.
A second bit of advice: Programming (and execution) skills are IMO heavily undervalued by people looking to get into ML. The faster you can write code, debug, and implement new things, the easier it is to produce good research.
Some books I liked: PR & ML (Bishop), Deep Learning Book (Goodfellow), AI: A Modern Approach (Norving), Elements of Statistical Learning (Friedman)
The vast majority of people who do this have graduate degrees. I'm biased, but I think getting a graduate degree in the subject would be the default suggestion. Are you considering it?
Quotes from the comments:
> For all my hacker news peeps that wants to learn ML and/or DL, you need to drop everything right now, go print this on the office printer, and sit outside with coffee for the next two weeks and read through this entire thing. Turn off the computer and phone. Stop checking HN for two weeks. Trust me, nothing better than this will come around on HN anytime soon.
> The authors are wrong to label this book as useful only to people with a physics background, and in fact it will be useful for everyone who wants to learn modern ML.
If you just want to work as an ML Engineer then take as many courses as you can on the subject before you graduate and get internships/apply to jobs. Nothing special here.
Pick a real problem, try to build a ML solution for it and while doing so keep a list of things you'd like to dig deeper into. Then go back to that list and pick one item to study, and iterate.
Happy to have a chat and give you specific pointers if you'd like (email in profile), I got my master in ML in 2016 and applied it in the industry since.
It is mostly math behind ML.
There are also his lecture notes from 2014 I suppose.
It is really all one need to know. Machine learning or Deep learning is just pattern recognition which uses NNs as the basis for representation foe the "matcher".
The real problem of ML is to find someone who would pay you for this because we are in a bubble.
It is math and algorithms that matters and there are just a few of them. All the tooling could be mastered in a month.
You need to be not afraid of doing proofs of theorems (most of them have to do with stats because machine learning is basically stats on steroids).
One has nothing to do with the other.
That said data engineering at scale pays a lot more than deep learning but is also a lot less fun. Figure out which you'd rather do.
I don't know how long it would take to train such network with a cheap laptop.
There are tutorials, but I don't see any cookie cutter thing.
I thought there would be demos for this, since image labeling is an old problem.
MOOC are a starting point, they often offer classes with little pre-requisites, but that'll only get you so far.
After that, I'd recommend: Statistical models: Theory & Pratice by Freedman.
A basic ( and not at all! ) approach to know if it is funny for you
Just add smort.io/ before any arXiv URL to read it in Smort.