Why doesn't Copilot for Business use your organization's actual repos?

Question

I was just looking at the page for Copilot for Business ( https://docs.github.com/en/enterprise-cloud@latest/copilot/overview-of-github-copilot/about-github-copilot-for-business ), and while it does offer some useful things for business users versus the regular Copilot product, it seems to me that they could make it much more powerful by fine-tuning a smaller model on all the repos in an organization (even private ones) that could generate additional recommendations that take into account all the existing code a company has. Obviously, users would want assurances that these personalized models wouldn't be shared with anyone outside of the organization, but Microsoft has enough credibility that I think many businesses would try it in hopes of enhanced productivity.Is it because it would be cost prohibitive? Maybe it only makes sense to fine tune a model if there are at least N users in the organization. Anyway, curious if anyone here has insights into this. Also interested in whether there are other companies offering this kind of product.

sqs · Accepted Answer

They're working on it: https://githubnext.com/projects/copilot-view/.We (Sourcegraph) are also working on it, and also bringing other kinds of code intelligence to the LLM so we can answer more kinds of questions: https://twitter.com/sourcegraph/status/1623339664428892162.

louiskw · Answer

A couple of projects you can look at that might be good substitutes for what you’re asking for -
https://www.codecomplete.ai/ - current YC batch, looks very promising and sounds like this could be a big part of their USP
https://www.tabnine.com/ - the OG code completion, mentions personalised models on their enterprise plan
It’s also on the roadmap for us (AI code search not completions) https://bloop.ai/

travisjungroth · Answer

Very limited opinion. Having looked into fine tuning Whisper and GPT, it looked to be a fiddly thing. The training isn&rsquo;t as robust as the inference. Makes sense, it&rsquo;s running at a different scale. You can manually check models before you release them. But that would put the cost at a different scale than $10/month.My takeaway was that it&rsquo;s very easy to mess up a model by fine tuning. You can overfit or have rapid degradation. Again, this is just reading about other experiences on other software, so maybe that&rsquo;s not the case here.

SteveDR · Answer

My org shot down an idea like this because they don&rsquo;t want OpenAI to train on (or have access to) our codebase. I don&rsquo;t blame them.

PaulHoule · Answer

The usual story is that fine tuning is not very expensive compared to building the foundation model.

aunch · Answer

Codeium has a self hosted enterprise solution that gives the enterprise the ability to fine tune on their repos within that self hosted instance!

Why doesn't Copilot for Business use your organization's actual repos?

They're working on it: https://githubnext.com/projects/copilot-view/.
We (Sourcegraph) are also working on it, and also bringing other kinds of code intelligence to the LLM so we can answer more kinds of questions: https://twitter.com/sourcegraph/status/1623339664428892162.

My org shot down an idea like this because they don’t want OpenAI to train on (or have access to) our codebase. I don’t blame them.

The usual story is that fine tuning is not very expensive compared to building the foundation model.

Codeium has a self hosted enterprise solution that gives the enterprise the ability to fine tune on their repos within that self hosted instance!