HACKER Q&A
📣 tentacleuno

Is GitHub Copilot / IntelliCode Legal?


Two years ago, an issue was opened in Microsoft's IntelliCode GitHub repository[0] titled "Licensing issues". It receives a response from a Microsoft employee. Eventually, the argument is made that this is a derivative work, as it is derived from thousands (?) of open-source projects. From what I understand, this seems to be true.

However, here's the fun part: Microsoft is training its AI dataset on these open-source projects. Would the terms of the license still apply here?

Further, would you say the law hasn't caught up with this use of open-source projects yet?

I am also curious about the legality of GitHub Copilot, since they seem to do largely the same thing from an AI standpoint.

[0]: https://github.com/MicrosoftDocs/intellicode/issues/201

EDIT: IntelliCode, not IntelliSense!


  👤 flowtheorist Accepted Answer ✓
It's definitely in a gray area because the AI models are essentially compression engines that encode the code samples/data into the weights of the matrices that represent the ML model and then "uncompress" it to serve queries. I think it would be easy to argue that a compressed data set no matter how illegible would need to conform to the same license as the data set it was encoding but I don't think any lawyer is smart enough to make that case. So at the moment it remains a very convenient loophole for companies that have enough compute to mangle the data set beyond recognition and then use it to their advantage. So this will probably remain a convenient loophole for large companies to sidestep licensing restrictions by encoding whatever data/code they want to use into some neural network and then sell it as AI.

For why these things are essentially mangled compression engines one can take a look at "Hopfield Networks is all you need": https://arxiv.org/abs/2008.02217. It allows representing all modern transformer networks (which is what CoPilot is using) as a bunch of hopfield networks which are essentially memory modules connected in some complicated topology to encode some data set.


👤 ThrowawayR2
Microsoft has skilled lawyers who think Copilot is legal. Same goes for Google's Alphacode.

The Software Freedom Conservancy has skilled lawyers who think Copilot/etc. isn't legal: https://sfconservancy.org/blog/2022/feb/03/github-copilot-co...

Until there are court cases that set precedent, nobody will know for sure.