My 16 year old nephew lives in an East African nation where there is practically no internet access.
Last week he asked me for advise as to how to go about training an open source LLM using an on disk Wikipedia (~80 GB).
Any suggestions? Thanks!
Here's his "1 hour intro to LLMs" video: https://www.youtube.com/watch?v=zjkBMFhNj_g
Most decent LLMs probably were already trained on wikipedia, that doesn't stop them from hallucinating when asked questions about it.
You can download llamafile and several models, put them on a USB drive or hard drive, them send the drive to him via DHL.