Have you fine-tuned LLMs to know the contents of a specific code base?

Question

I am interested in trying to make LLMs know the contents of my project, so it can know what classes/functions/variables there are outside the current file/prompt. The first idea for "adding" knowledge of the code base (assuming it is too large to fit into the prompt) would be to fine-tune the LLM on the code. Has anyone tried this or knows of any work on it?

TroyZ · Accepted Answer

Fine-tuning is probably not the way to do it.
Try embedding, semantic search, retrieval, and plugging the relevant parts into the prompt.
You may need: - summarizer prompt to summarize your project structure, main functions, methods. - vector store/database to store and retrieve your relevant code from code base - coder prompt to write code based on the retrieved part.
Check out langchain: https://langchain.readthedocs.io/