HACKER Q&A
📣 promisebyte

License to enforce open source on models trained from public code?


New technologies like GitHub Copilot or OpenAI Codex use public code to train their models to generate code.

Is there a license to enforce open source upon output trained from public code? Is there a new license in the works? I don't want to prohibit training.

I previously thought that current GPL compatible licensed code was enough but I've seen some cases of it being violated under the guise of fair use, with models outputting code without any attribution and relicensed in non-compatible ways.

AGPL added a requirement to close the server-side loophole, is there a license that covers the new "AI scraping/training loophole" or a new clause perhaps be added to account for AI-training? (i.e. if you train a model using the given code, your model must also allow others to download the source code/training set/model corresponding to the output)


  👤 wmf Accepted Answer ✓
Some experts are saying this can't be fixed because licenses simply don't apply to machine learning. I think the real question is how much money you have to spend on lawyers.