How to build a LLM model on local files?

Question

I am part of a larger community, which organizes themselves through loads of E-Mails, PDFa etc. Many questions one has about the current state of affairs could be, in my opinion, done through a ChatGPT like interface.How would one go about training a model based on local files? Is it possible? What would I have to do?

brucethemoose2 · Accepted Answer

For non commercial use? To answer your question, finetune a llama based instruction model, maybe using the lit-llama repo. For this you will need to rent a pretty beefy cloud instance, and you will need to resume the finetuning (or use a LORA) to put new data in. Then host it on a cheaper server with a llama.cpp frontend.But what you really might want is a vector search. This seems like a better fit.

tikkun · Answer

There are some "drag and drop" type solutions, like https://www.chatbase.co/. There are various more - search for custom chatgpt on product hunt and you'll find a lot.