Is RAG is still the way to go? Should one fine tune the model on top of that data as well?
It seems that getting RAG to work well requires a lot of optimization. Are there many drag n drop solutions that work well? I know the open AI assistant API has a built-in knowledge retrieval, anyone has experience how good that is compared to other methods?
or is it better to pre train a custom model and instruct train it?
Would love to know what you guys are all doing!
I'd say RAG is still very much the way to go. What you need to then do is optimize how you chunk and embed data into the RAG database. Pinecone has a good post on this[1] and I believe others[2] are working on more automated solutions.
If you want a more generalized idea here, what state of the art (SOTA) models seems to be doing is using a more general "second brain" for LLMs to obtain information. This can be in the form of RAG, as per above, or in the form of more complex and rigorous models. For example, AlphaGeometry[3] uses an LLM combined with a geometry theorem prover to find solutions to problems.
[1] https://www.pinecone.io/learn/chunking-strategies/
[3] https://deepmind.google/discover/blog/alphageometry-an-olymp...
I'm now working on a "hybrid" search combining lexical and semantic search, using an LLM to translate a user message into a search query to retrieve data.
As far as I know, there's not a "standard", the field keeps moving and there are no simple answers.