I'm clueless, can someone ELI5?
To be honest, yes it is possible. Most models I made could run on a mobile device, mostly they would not because they were written in python and since it is possible/cheap, I would not care too much about RAM and efficiency for a training job.
I think the dataset size is overrated by things like Kaggle or news about Deep learning models for image recognition. Bigger datasets are better, but if your data quality is good few hundred of rows (like a csv file) can be enough for many applications!
Most data challenges are not image recognition or NLP either, so you could do them on smaller devices. I think the main issues would be 'support' tho. Small devices do not run python (or R/Julia) so you need to port your inference code to some binary (like webassembly) or rewrite in C/C++. Inference code is much smaller than training/ experimenting code fortunately.
Training is when you show examples (instances of something) to an entity and it learns to recognize them. Example: doing homework and math exercises.
Inference is when you show the entity an example it has not seen before and ask it to draw form its training experience to make something of that example.
Example : a seasoned cop has had many more interactions than a green cop. However, sometimes People are more intuitive than others and dont need that many years to read situations. Their learning algorithm is different, or they're looking at things others are not looking at (features).
You probably are using machine learning inference on your mobile device when you text and it recommends the next word. This application does not request a server because it needs to be low latency. You type fast, and you need the model to be right there. The same case is to be made for self driving. This poses several challenges and relies on several techniques to get the models on constrained environnements, either to run or get them there in the first place.
Second, for training ? It depends on the problem you are solving. Are you trying to predict something that is so rare that even if you have a year worth of data, it only has happened twice? Is there a lag between influencing factors and the phenomenon ? Say, changing nutrients for a plant and its state not instantaneously changing. This depends on the problem.
Training is quite expensive computationally, but inference needn't be. We have many models that can run on a smartphone, after all.
However, you can do some limited training on the smartphone by leveraging pre-trained models. Usually the internal representation at the very end of the network can be used as an input to train a simpler algorithm on top of it.
But all of the above depends on what actually needs to be done, which you have not specified. Classical, non-deep, ML models could easily be trained on a smartphone provided the datasets can fit.
The keyword for you to seek would be "edge AI", if that helps.
Training an ML model typically takes large datasets and compute power.
Using a model that has already been trained requires less compute power and some ML apps (trained models) certainly exist on smartphones.
One example is in some motion and gesture apps that detect if you are walking / running / riding a bike. Some use ML. The classifiers were trained on a large set of data external to the app, but they run on the device afterward.
but in practical terms, there is absolutely no reason training would be done on a smartphone. or any pc for that matter. you only need to train once, then you can use anywhere.