I'm wondering if anyone has had success moving into this field, for a generalist engineer? I'd imagine advanced degrees aren't required for everything? ML infra and stuff, perf/optimization work etc... Maybe learning materials, resume and interview advice etc? Thanks in advance if you have an interesting answer!
But if you, like me, are happy to be an “understand and implement the paper” person instead of a “co-author the paper” person, that is eminently achievable via self-study and/or adjacent industrial experience. In fact, it’s never been easier as world-class education is more available on YouTube every day.
3Blue1Brown covers all the linear algebra, calculus, information theory, and basic analysis you need to get started. Ng and Karpathy have fantastic stuff. Hotz is writing a credible threat to PyTorch on stream/YT for the nuts and bolts accelerated computing stuff. Kilcher does accessible reviews of modern stuff. The fast.ai stuff is great.
This is all a lot easier if you can get a generalist/infrastructure role that’s ML-adjacent at a serious shop (that’s how I got exposed), but there’s enough momentum in open source now that even this isn’t a prerequisite.
I say fucking go for it, and if you want any recommendations on books or need someone to ping when you get stuck, feel free to email.
Yes, absolutely doable. Immerse yourself into learning things well. Learn the basics. Don't do course hopping & book hopping - pick a rock solid book or lecture series, devour it & get building. The last part is the most important part.
People with ML MS/PhD only have an additional (1) degree & (2) networking. If you invest time, you can overcome (2) by asking good questions in Twitter/Reddit & making connections. I still do it after finished my degree. Twitter is the Linkedin for ML.
As for (1), YSK that most advisees are getting advised by professors who made it big before deep learning took off. So everyone is still on the learning curve of sorts - advisors, advisees and your peers. Sometimes student's intuitions could be better than the professor's. Don't sweat over it. Focus on building.
Software engineers by and large, can make great AI/ML practitioners - the specialization is a smaller leap than say from business analysts or any folks who're less likely to be able to install their own OS or automate task with scripting.
Easiest way to get into that work, in my opinion, would be to take a data engineering job on a team that has an AI/ML capacity and then start learning from that team and taking on some of the AI/ML tasks directly. Alternatively, you could take a role at a smaller business that needs a generalist but also wants to invest in AI/ML (though in this case you will be more on your own to self-learn and it won't work quite as well for stepping stone into a more pure AI/ML role).
1. Find the best 1,2 courses for AI/ML At the time the Udacity self driving course was a great course from a basic Udacity ML intro course to a full system that
2. Allocate 2-4 hours a day for this. This is a heavy course and a lot to learn so you have to work really hard to get this done.
3. Final project should be impressive to people int he field. So for example I implemented a YOLO alternative from their paper. You’ll have to do something similar and show results.
Then getting a job is a completely different skill and you’ll be looking at a job on the margins. Keep backup options in lower pay jobs in startups or in non tech companies if your dream jobs don’t pan out.
It could take 6 months to 9 months to understand the content and then 6-9 months to get a job. IF you are okay with that … then do it.
Having ML + software eng background is really good spot to be in.
It’s all very cool and you definitely don’t need anything at all to do any of it.
Arguably, one of the best times to get involved. You may not want to fight about making the 99.9++% accurate system, yet a bunch of non-specialists were the first to extend some of these models, and actually apply them on non-toy problems.
There are also a lot of sites/guides/walkthroughs that did not exist 6 months ago where you can rapidly get a feel for "what can this actually do?" [1][2][3][4][5][6]
[1] https://huggingface.co/stabilityai/stable-diffusion-2-1?text...
[3] https://open-assistant.io/chat
[4] https://hacks.mozilla.org/2023/07/so-you-want-to-build-your-...
[5] https://writings.stephenwolfram.com/2023/07/generative-ai-sp...
ML Infra is just Infra, with a different set of needs and issues than say, Spring Boot infrastructure or something. Self taught here is fine, and honestly I'd trust a software engineer with basically no experience to handle ML infra better than an ML researcher.
I think of it like the gym. You could get in there and start something tomorrow, but unless you've been taught good form there's a good chance you'll injure yourself.
I think the problem seems to be that there’s not a clear need for engineers with lower levels of expertise in some of the fields; that is people with less than an MS degree in that specific ML sub field.
If the field transcends for long enough, and urgent demand becomes so mainstream, then I imagine hiring managers will have to invest more resources in hiring people with less experience and training them in the job. Similar to how developers are hired fresh out of short boot camps these days.
Therefore the easiest way to transition would be by acquiring practical an theoretical knowledge using the strategies given in the other answers and then applying to ML teams where there’s enough demand for them to want to train you in the job. Of course that’s easier said than done. It might be interesting to hear some thoughts on whether this is already happening in certain fields.
The issue isn't that it can't be done -- in fact, the greatest need right now is for engineers who can come in and build rock solid real world applications on top of commodified neural network architectures and weights, not PhD scientists. Your business might not even use its own ML model! You might just be calling an API.
The challenge is that for a few reasons, it's a very crowded market right now. A lot of people want to make a move into AI, yet for all the hype, the space of viable commercial applications that will survive without indefinite VC funding remains kinda small. Look how AVs are doing after billions and billions in funding chasing one of the most lucrative commercial possibilities imaginable. There's really cool stuff happening industry-wide, and commercial potential is growing, but nowhere near as fast as the cultural hype that has infected certain parts of tech space these past 12 months or so. Plus, many experienced ML engineers and scientists have been dumped back into the job market due to layoffs. So from the hiring side right now, for every AI posting there are tons of applications that have the cool portfolio, and then also a relevant degree and/or prior experience.
That's what you're competing against, so if you're going on portfolio alone it's got to be really outstanding. Way beyond doing the homework for a free course. Learning how to build an ML service that solves an actual problem in the real world reliably enough that you can actually use it should be the goal.
If you happen to be employed at a company where there is a need for an ML engineer in some capacity but no availability (hiring is expensive!), you can try stepping up to help out. Hiring challenges aside, it is absolutely possible to learn on the job the engineering skills needed to, say, build ML infra or work as part of an MLOps team. I recognize that's sort of just up to circumstance though. If you look for a new job, be a little wary of any that want to hire you for a more mundane task (like data entry/cleaning/labeling) with a promise of getting to do the ML engineering stuff, too, "eventually". Such roles do exist but it's also a bait-and-switch tactic.
Anyway, that's what I've got as someone who has been thinking about how to help people looking to do what you're doing, but i hope this thread turns up more ideas too.
Even though I'm not a formal researcher, I've been able to contribute to research projects and be included in papers because the field is so new.
The most important criteria I look for when I interview applicants is what they have built. Github repos, papers even cool Product Hunt projects can have impact.
When you train small models on small datasets you get very bad out-of-distribution results, but when you use these LLMs they have already seen everything on the web so they are not as often OOD.
https://www.edx.org/course/columbia-engineering-machine-lear...
Actually coming up with a new model? I am not sure, maybe not.
That being said, let's not fool ourselves that going from knowing nothing to being pretty good at the stuff requires some formal, structured education. Programming is much more of a craft than it is either an art or a science and the field of AI is no exception.