These can be ever-evolving and may hook up with periodic delta updates, manual sync, add/remove, etc. And I'm trying to figure out if there's a way to manage these docs/texts properly. Basically, I think I would need a system to store these files, their metadata, etc, and provide a web UI for people to manage them. Then these blob of texts will go through frameworks like langchain/LlamaIndex and be cleaned/chunked into vector db, and different chunking strategies can be A/B tested while other people maintain this ever-growing docs system.
Any suggestions are welcomed. I've tried some all-in-one frameworks but so far my experience are lackluster. Also, my company due to compliance constraints cannot use cloud-based solutions, so it has to be either open-source local-deployed, or developed locally.