HACKER Q&A
📣 amrrs

Why do devs feel CoPilot has stolen code but DALL-E is praised for art?


Both DALL-E from OpenAI and Github CoPilot work on the same philosophy. While one receives a massive praise online the other is being criticised that it's working with copied code and just pastes copied code.

Why is this difference in opinion?


  👤 gwern Accepted Answer ✓
False premise? Loads of artists hate DALL-E and AI art in general; they typically have much less feelings about Copilot. Unsurprisingly, for devs and DALL-E, it's the other way around.

👤 Engineering-MD
If I can give a different view to everyone else: because it’s much easier to demonstrate technically. If copilot writes some code that’s copyrighted or semantically identical to another, it’s easy to copy the text and search Google for it. Bam, copyright infringement.

For DALL-E, I can’t select components in it and search for it. It may well copy components on other peoples art verbatim (is that the right word for images?) or exactly, but I can’t show where it’s ripped it from.


👤 joeld42
Because Copilot can generate verbatim copies of code, with only the whitespace changed in some cases.

The AI art generators are trained on copyrighted work, which is still a legal grey area, but they don't stamp or copy/paste parts of the source images unchanged into larger ones. Though I'd imagine if you used the same algorithms for something like logo or typeface generation you'd run into trademark issues.


👤 alexvoda
First of all, as gwern said in one of the top level comments:

> False premise? Loads of artists hate DALL-E and AI art in general; they typically have much less feelings about Copilot. Unsurprisingly, for devs and DALL-E, it's the other way around.

Secondly, if by theft you actually mean copyright infringement, which is not the same thing, I believe we are reaching the breaking point of copyright legislation. We are getting to the point where copyright is so detached from reality it is no longer maskable and the notion itself becomes nonsensical.

As others have stated, even without AI, code gets routinely copied. There are only so many ways you can write a well defined, common knowledge algorithm, your version is bound to look like someone elses.

Just like genes are a fundamental part of the biological world, memes are are a fundamental part of human society. They exist ever since humanity exists. They replicate (copy) and mutate (remix) and evolve due to selective pressure. Copyright is an aberration opposed to the very nature of memes, because it is a mechanism designed to restrict their replication and mutation.


👤 unconed
I find it weird people are focusing so much on the tech, instead of the kind of person who would use it.

After decades of professional software development, it should be clear that code is a liability. The more you have, the worse things get. A tool that makes it easy to crank out a ton of it, is exactly the opposite of what we need.

If a coworker uses it, I will consider it an admission of incompetence. Simple as that.


👤 marginalia_nu
DALL-E is maybe copyright infringement, CoPilot is whitelabeling code sometimes in flagrant violation of their licensing terms, which may require such things as attribution, or even that the derived code is licensed in the same way.

👤 addicted
The real reason is because this time it hits developers’ pockets.

You saw the same thing with how online forums which were predominantly dev/tech focused (/., Digg) used to decry any attempts to protect online music/video, but would simultaneously complain about software piracy.

It boils down to hypocrisy, really.


👤 aviditas
Personally, I believe the underlying difference is that the parent company of DALL-E is a nonprofit with clear cost recoups for their api pricing and the access to DALL-E is currently free. CoPilot is from a for profit company which has been around enough for a lot of people to have strong opinions of. CoPilot was initially introduced as a product that would have a subscription or license cost. There are copyright and attribution differences and some similarities but from someone on the edge of the development realm my reaction was firmly because of the free/at cost transparency for OpenAI versus the clear 'this is a product to make money from'. I'm not saying that the latter is an inherently bad thing but it absolutely colors the way you feel about it.

👤 lovich
A small change in art is considered materially different, isn’t it? We don’t charge artists for making similar art inspired by others.

Combine that with rulings in other areas of IP law, such as the patent office ruling that an inventor must be a human and AI cannot be assigned IP rights[1], I think that would lead towards DALL-E created art being assigned to the human who prompted the tool for art or the creator of the tool.

Compare this to software when you can be sued for using the same pieces of code, and that seems to be the root cause for the difference

[1] https://www.ipwatchdog.com/2020/07/13/artificial-intelligenc...


👤 sascha_sl
Copilot is just showing a problem we had all along more clearly.

Commercial entities take things that are written and maintained by volunteers in their free time and barely give back. And this culture has just developed in a way that this is at most a moral failure, not a legal.

People put it on the internet with a permissive license because they want others use it. This is great because people love sharing, but now, before you know it, you're suddenly supporting large enerprises and not just people that want to put cool things together like you. And you don't get any kind of compensation, even though it'd be peanuts for the company to pay for the time you put into maintenance and new development.

And if you stop, someone else will always fill the void, eager to work for free.

Maintainer burnout is a real thing and I wish the largest user of FOSS code would do their part to make the time spent on every piece of code they use worth it.

Half of "open source" is a success story, the other half is just sad.

Copilot just automated this entire process and sells the product of code back to developers, not enterprises.


👤 jimmySixDOF
Interesting to note the official position is that you cannot copyright AI generated art due to the lack of human authorship.

[1] https://www.smithsonianmag.com/smart-news/us-copyright-offic...


👤 anothernewdude
I (foolishly) hope that this will lead to a collapse in copyright.

👤 Wowfunhappy
I thought marcan on Twitter had a pretty good take on this:

> Source code [is] relatively low entropy compared to, say, images. It's a lot easier to see how feeding a pile of photographs into an AI that maps them to subject erases the copyright of the photographers, because the subject isn't copyrighted.

> It's trickier when you feed artwork or music into an AI, because the AI might reproduce specific artistic or musical choices that happen to pinpoint a copyrighted work too well. With prose it's even worse. And code has to compile, [so] it's even more constrained!

https://twitter.com/marcan42/status/1539825034860335105


👤 ramesh31
Code is shared with explicit licensing. Art exists in a more vague space, where the line between stealing and influence is subjective.

If your code is released with no license, it should be fair to assume free use. But if it is licensed, tools like Copilot should be forced to respect that.


👤 im3w1l
One distinction is that CoPilot is only trained on open source code. It transfers value from the most self-less people to profit oriented people. No such distinction exists for art.

Actually I think it should be up to the art community to decide how they want to deal with dall-e. I know that musicians have a very complex system for dealing with samples, and arrangements and lyrics. As an outsider I have barely scratched the surface on how it works, so trying to understand the why is hopeless.


👤 muzani
Most of these arguments for or against are irrational, with poor understanding of what "training" really means. Some cases might not violate laws, but some of those are violating the spirit of those laws, and laws are often revised to take new technology into account.

But Dall-E in general can be another medium for art, which is why some artists give it massive praise.


👤 shpx
I don't consider Copilot to be stealing code.

👤 uberman
Perhaps one is an interesting novelty that while fun to tinker with, has little practical value. While the is potentially a tool.

👤 isitmadeofglass
Artists who end up seeing parts of their artwork on T-shirts getting a “no, but an AI did this” will be equally angry.

👤 Schroedingersat
Dall-E is just as much enclosure and theft of the commons as copilot is.

Main difference is there isn't as hard a line between it and lower resource projects such as disco diffusion.


👤 mbgerring
Ask some artists, my art collective is using Midjourney as part of a project, and we are actively discussing how to deal with the subject of authorship and credit.

👤 tpoacher
Replace CoPilot/code with Plagiarism/intellectual property, and DallE with Forgeries and re-read the question.

👤 asddubs
because dall-e is mostly a fun novelty, but if you were to use more advanced versions of it for actual image generation, i would definitely be concerned how much of someone elses image ends up in the thing you generated.

👤 dekhn
Because good artists borrow, but great artists steal.

👤 ksaj
I just made some Snoop Dog Bored Ape Yacht Club NFT images (inspired by your question), and the output makes it pretty clear they've even scraped the real BAYC images.

👤 naikrovek
because Copilot was made by Microsoft and DALL-E was made by Google.

that's literally all it is.


👤 Jupe
Because, generally, art is not considered a real profession. "Making it big" in the art world is an incredible gamble, and even more-so today with so many attention-grabbing alternatives (movies, TV, YouTube, blogs, music, etc.) The impact of AI-generated art will barely be felt, in my opinion.

While software engineering, on the other hand, is considered a "real" profession. Millions of people make a steady living coding, designing, testing, architecting, etc. The impact of AI on the software development work-flow will be immense. Tooling has already impacted GIT PRs at my place of work, and when these tools get smarter the impact will only grow. And, if the pattern holds, such tooling will be introduced earlier and earlier in the design/development workflow.

I'm probably way off, but at the rate things are progressing, I'll give it 10-to-15 years before software development paradigms such as cucumber test files or hexagonal designs/drawing are directly transformed into coded, reviewed, tested, running and deployed solutions.