HACKER Q&A
📣 a_w

16 yo Nephew, in E. Africa, wants to train an LLM with on disk Wikipedia


Hello HN!

My 16 year old nephew lives in an East African nation where there is practically no internet access.

Last week he asked me for advise as to how to go about training an open source LLM using an on disk Wikipedia (~80 GB).

Any suggestions? Thanks!


  👤 runjake Accepted Answer ✓
In addition to the other great suggestions, point him to Karpathy's YouTube channel[1]. Karpathy has an approachable communication style.

Here's his "1 hour intro to LLMs" video: https://www.youtube.com/watch?v=zjkBMFhNj_g

1. https://www.youtube.com/c/AndrejKarpathy


👤 FrenchDevRemote
Not an expert, but maybe using RAG/embeddings on the on-disk wikipedia would be better than finetuning on wikipedia?

Most decent LLMs probably were already trained on wikipedia, that doesn't stop them from hallucinating when asked questions about it.


👤 icsa
Use a model already trained on Wiki[edia using llamafile.

You can download llamafile and several models, put them on a USB drive or hard drive, them send the drive to him via DHL.


👤 throwaway11460
Would it be possible to ship him a Starlink terminal? Internet access could do wonders for a young interested guy like that... And he could share that connectivity with people around him too.

👤 joegibbs
What kind of GPUs does he have?