Why isn’t it possible to train LLMs on idle resource like SETI home?

Question

What makes using volunteer compute resource not practical for training large scale LLMs. Something similar to the SETI@home project or the Mersenne prime number search which enabled users to effectively pool available compute resource together to solve some large problem.It seems like compute resources are quickly becoming a bottleneck and moat preventing ML researchers to train and use LLM type language models.Would be great to see a more publicly available solution to this, to break down the dam so to speak and give everyone access to SOTA LLMs

kingcai · Accepted Answer

ML training is not as easily parallelizable as the other problems that have been explored. I'm not familiar with SETI but I know this to be true for folding@home.
As you mentioned, ML training can be parallelized but this requires either model/data parallelism.
Data parallelism means spreading the data over many different compute units and then synchronizing gradients somehow. The heterogeneous nature of @home computing makes this particularly challenging, as you will be limited by the smallest compute unit. I've personally only ever seen data (and model) parallel done on a homogenous compute cluster (i.e. 8x GPUS)
For model parallelism, we split the model across different compute units. However, this means that you need to synchronize the different parts of the model together, which can get very expensive when you do it across the internet. If you have 8xGPUS on one machine, your latency is limited by PCIe instead of TCP/IP in a distributed @home cluster.
But I would say it's not impossible, someone clever could definitely figure it out.

lm28469 · Answer

Probably because your entire country worth of personal computers delivers the same capacity as a rack of dedicated hardware

thedevindevops · Answer

ML training is iterative and non-parallelizable so breaking it up into distributable units of work would not provide any benefits and would actually slow down learning.

Am4TIfIsER0ppos · Answer

Aside from technical reasons why would I volunteer my computing time for it to be wasted when someone lobotomizes the end result with their "bigotry protections"?

gitgud · Answer

Leaving a comment to come back in a few years/months when someone releases an approach that makes this possible...