My initial thought was to take probability theory since math is usually harder to learn on your own. On the other hand, I've heard distributed systems is such a key part of ML engineering and the software foundations course is taught by the man who essentially wrote the bible on the topic, Benjamin Pierce.
These are the related courses I've already taken:
ML-related - ML, NLP, intro probability, linear algebra, statistical inference, independent research in few shot learning
Systems-related - OS, networks
PL-related - functional programming in Haskell
Ideally I would pick all three, but alas, I can only pick one. Which one should it be?
You are right - outside of classroom setting, it is quiet difficult to learn maths on your own and yes, maths (especially probability and stats) are precursor to become good at ML and Data Science.