However, here's the fun part: Microsoft is training its AI dataset on these open-source projects. Would the terms of the license still apply here?
Further, would you say the law hasn't caught up with this use of open-source projects yet?
I am also curious about the legality of GitHub Copilot, since they seem to do largely the same thing from an AI standpoint.
[0]: https://github.com/MicrosoftDocs/intellicode/issues/201
EDIT: IntelliCode, not IntelliSense!
For why these things are essentially mangled compression engines one can take a look at "Hopfield Networks is all you need": https://arxiv.org/abs/2008.02217. It allows representing all modern transformer networks (which is what CoPilot is using) as a bunch of hopfield networks which are essentially memory modules connected in some complicated topology to encode some data set.
The Software Freedom Conservancy has skilled lawyers who think Copilot/etc. isn't legal: https://sfconservancy.org/blog/2022/feb/03/github-copilot-co...
Until there are court cases that set precedent, nobody will know for sure.