The idea is to query an ai model that is knowledgable on a particular subject matter.
I have no idea where to begin.
I've looked at RAG and embeddings but they both appear to require you send the context (ie: book content) with the query, which would be way to large for one query. So I'm thinking it'd be better to just train an entire ai model.
RAG (with vector db) I've found to be a little finicky, and requires a decent amount of preprocessing of your data in order to split it up into reasonable chunks that vector encode well. I also think that due to the nature of how RAG works, it might not be able to do certain types of queries unless you allow for a more complicated back-and-forth of querying. But I've only toyed/built some proof-of-concepts with it, nothing production ready. I found llama-index to be useful here; it lets you spin up a barebones RAG system with sensible defaults in like 20 lines of code. But ofc for any real world application you'll have to start making a bunch of mods. Would love feedback from folks who have used RAG in production systems -- was it difficult to split your documents up? Did you have trouble with it using irrelevant chunks from your vector db? Etc.
Fine-tuning is modifying an existing llm to have knowledge about your data. It's been on my to try list for a while, but it's got a higher barrier to entry. There are good video tutorials on YouTube though. It seems like this would allow for a more in depth understanding of your documents, making it more likely to answer complicated prompts. But would love feedback on that hypothesis from folks with experience using it!
Models are more for completion. Think of it like autocomplete. If you wanted a model to be good at storytelling, you'd train a model for that. Or say, writing Assembly code. It's like you write "Go to" and the completion model figures out the next word, which may be "jail", "Mexico" or "END".
Fine tuning is a way to bias the completion towards something. In general, it's better to fine tune a general model like Llama or GPT-4 than train it from scratch.
Embeddings models are there to decide which words are related to another. So you might say cat and dog are near each other. Or cat and gato. But cat and "go to" are far from each other. Where encoding turns letters and numbers to bits, embeddings turn words, phrases, images, sounds into vectors.
Since vectors are a little different to bits, they're stored in vector DBs. Vector DBs are often a pain in the ass to deal with. And embedding is super cheap. So, often RAG embeds the entire book each time in tutorials. This is not good practice.
RAG is really a fancy term meaning query, then generate based on that query. So tutorials would embed a million words, toss that in memory, query the memory, then throw it out. That's... wasteful. But not as wasteful as training a model. You should store it in a vector DB, then query that DB.
If it's for a few uses, then just use langchain for RAG. If it's for over 100, then you want to convert your text into embeddings and put that on a vector DB. If it's small and static, LanceDB is fine. Or pgvector (Supabase supports this).
If you want scale, there's plenty of others, but the price goes up fast. Zillis and qdrant seem be good at higher levels, especially if the text is updated continuously.
2000 texts is far too low a number to train from scratch. Fine-tuning is often used to refine a previously trained model, not sure 2k is enough for this.
RAG is really the first method you should be implementing. LlamaIndex has the best examples that are easily repurposed (imho)