Gracias!
- Know the Statistical language that you learn from a basic college-level Stat 101 course. Be able to translate normal sentences into those using Statistical notation, and be able to read easily. Also, know basic Statistics.
- You already know programming, I assume. Learn Python if you don't know already. It's really easy.
- There are a number of paths you can go from there. Here's what I did.
-- IBM Data Science Professional Certificate (not deep at all, but lays out the landscape well; did it in a week)
-- Machine Learning for Absolute Beginners by Oliver Theobald which you can finish in an evening.
-- Machine Learning Specialization by Andrew Ng on Coursera.
-- Deep Learning Specialization by Andrew Ng on Coursera.
-- fast.ai course.
- Learn PyTorch really well. I suggest Sebastian Raschka's book.
Now from here, you can chart your own path. You can choose NLProc, Vision, RL, or something else.
I went towards Vision. And I do Edge AI as hobby.
I was in the last year of college as a Physics undergrad, when I was hired to do Vision modelling/research for a non-flashy company in 2021. Finishing my CS Master's next month and starting to look for PhD. I worked in the same company for the ~2.5 years.
EDIT: If you want a job in big tech, grind Leetcode, and learn about system design, study Machine Learning systems, and be able to design them. Chip Huyen has a good book as I hear. 6-7 rounds of interview is common in Meta/Google. DL hackathon awards, open source contributions are significantly helpful.
There are some AI Engineers with strong scientific/mathematical background, but that's rare. Usually, you're paired with these ML people that actually develop and evaluate the models.
So my advice is to start with Data Engineering and then find a specialization AI. You should have a VERY solid foundation on scripting and programming, specially Python. Also, a lot of concepts of "data wrangling". Understanding how data flows from point A to point B, how the intermediate storages and streaming engines work, etc. Functional programming is key here.
Huggingface has a bunch of courses, but that's a good one to start with. You can do the exercises on your own computer or on a cloud server if you want access to a more powerful GPU. If you go through these courses and pay attention you'll be in a really good position.
Instead, of targeting AI engineering, I would focus on obtaining a solid mathematics background (calculus, linear algebra, discrete mathematics) and a solid computer science background (algorithms, data structures, distributed systems, databases/data storage/data retrieval).
Then with those skills, you can easily become a "SW Engineer who leverages AI" which in my guess will be a much better job and more stable than "AI Engineer"
It was previously popular on HN though only inside a comment thread, and I haven't submitted it as a link post yet.
Covers a lot of ground
Start with reading AI Canon and setting up projects like PrivateGPT and AutoGPT locally, then working with LocalAI to serve up HuggingFace models in place of OpenAI models.
For applied ML, my tips are: make sure you learn the dark side of BatchNorm and Dropout, start with simple and elegant baselines instead of complex SOTA algorithms, spend more time on understanding your data than trying algorithms, be aware that SOTA in a related task will often suck at your task, be data driven. Also, most of your ideas will not work but you have to try and conduct experiments carefully.
There was a brief moment in time wherein data engineers were computer scientists specialised in distributed systems and data processing algorithms on commodity hardware. You had to know a lot on average.
Then came commoditisation via the big vendors and now you really don’t need to know very much. As a result it is not uncommon to meet “senior” data engineers who mostly script Python, do SQL and configure Airflow.
I think ML ala AI has already gone that way and many vendors are strongly promoting developer participation with courses and plug-and-play resources.
So what do you need? A vendor certificate you took over a weekend, and an employer to say “yes”…
2. Prompt ChatGPT for "how do I make a pytorch program that learns to do 3. Ask ChatGPT for code and for it to explain the code 4. Run the code on Google colab, if it doesn't work, ask ChatGPT why and keep rerunning it until it does 5. If you find some API that's too new for ChatGPT to know about, just paste in the API documentation and then ask ChatGPT to propose some code using it ChatGPT is wrong a lot, but if you keep badgering it, eventually you will get a solution that works. It's like having a tutor standing next to you that you can ask questions to, I can't think of a better way to learn even if it's wrong on occasion.
It's definitely a plus to know a bit about the Mathematics, but I doubt anything short of a Master of Science in Math with AI as specialization is going to close the gap. How many data engineers can do that?
Wouldn't it make a lot of sense to hire someone with zero AI exposure but tons of experience in sysadmin, data engineering and ops? It's going to be tough to find someone who are both an engineer and a Math wizard, I think.
1. MLE/DE/MLOps - this is more like typical software engineering. You're responsible for building data platforms, tools, monitoring, and more around the model development lifecycle. This can include: data ingestion, data architecture, data transformation and storage, automating and productionizing various workflows like training, evaluation, and deployment, monitoring deployed models, data monitoring (and building monitoring), tooling like feature stores (and libraries for R&D teams to interact with them) or internal deep learning frameworks, etc. You'll basically work as a part (or an adjunct to) the research team that is testing new model architectures, different approaches towards some goal, etc. These are largely taken from my own experiences and projects I've built. Skills: software engineering, Python, knowledge of the model development lifecycle, data architecture/engineering, some knowledge about the frameworks used, cloud platforms, etc. Designing ML Systems by Chip Huyen is a great overview of all of this kind of work.
2. Research. This is actually building models, implementing papers, very occasionally (especially in big companies) doing publishable research. This is more akin to academic work (my educational background is in hard science academia), and requires a lot of paper reading, experimentation, etc. It will require knowledge of your niche (I mostly work with CV teams, for instance), strong math fundamentals, and very often a PhD.
I can tell you how I, as a self-taught software engineer with a bio education got here. My first job was a generic enterprise desktop application development role, randomly joined a data engineering team shortly after that not even knowing what DE was, but knowing I liked it. We worked on a massive distributed ETL system. I then joined my first startup, it was also a DE role, but we were a small group in a larger research team where I got my initial exposure to ML workflows and especially moving them to the cloud. We did some simple model training, data management, and building products around the models we built while also supporting the research efforts of the larger team.
I then went to another startup, where I had the sole responsibility of our research infrastructure (largely based on the strength of my knowledge of AWS and Python). I was the sole engineer on a team of CV researchers, and did things like automate their entire evaluation workflow and move it to the cloud, worked on the internal deep learning framework, and built a team to evaluate the current AI development lifecycle and design a platform to harden and optimize the process. Covid put the kibosh on that. I moved to another, earlier startup, doing similar work but more foundational - almost everything was built from scratch.