HACKER Q&A
📣 jerrygenser

Image attribution and stable diffusion


There are websites that offer free images but either require attribution of the source of the image or else require paying a premium membership or subscription.

Presumably these companies are scraping public websites to check if their images are being used without attribution.

If someone was to take the image and run it through stable diffusion to generate a new image but using that image as a source, should this also require attribution if it was just used as a starting point?

I'm sincerely curious on peoples thoughts from both an ethical AND legal perspective (with all the usual disclaimers)

For example, one perspective is that the generated image may not resemble the original image but in a sense was used to get to that point, similar to how an artist may see a copyrighted image and decide on a creative spin on that image.

A further perspective is that stablediffusion may have been trained on copyrighted images in the first place even though it may not exactly reproduce an image in it's training corpus.


  👤 alexpetralia Accepted Answer ✓
Legally, I imagine it is still a legal grey area both on whether copyrighted training data can be used, and whether the outputs of the model can be copyrighted. More reading: https://www.theverge.com/23444685/generative-ai-copyright-in...

Ethically, I don't see remixing and derivative works as so different from what humans naturally do.


👤 kuu
I am not a lawyer but what you're proposing to generate a what is called a "derivative work" and depending on the country, laws are different:

https://en.wikipedia.org/wiki/Derivative_work


👤 solomatov
>There are websites that offer free images but either require attribution of the source of the image or else require paying a premium membership or subscription.

There're no prerequisites for fair use. I.e. licenses don't apply in such cases.

You could read more about it here: https://en.wikipedia.org/wiki/Fair_use

P.S. I am not a lawyer.


👤 rvz
Well another AI model created by Stability.ai called Dance Diffusion [0] wasn't widely reported with all the fanfare and that was deliberately trained on "public domain data", "Creative Commons-licensed data" and "data contributed by artists in the community." which is opt-in.

> A further perspective is that stable diffusion may have been trained on copyrighted images in the first place even though it may not exactly reproduce an image in it's training corpus.

Given that watermarked copyrighted images sometimes come up in the outputs, I'm sure that is the case. Had that been trained on copyrighted music, the whole company would be sued to the ground and the model would never be released without permission or no attribution.

So for StabilityAI, it is fine to break the copyright of digital artists and use their work without their permission, but not fine to do the same to copyrighted musicians and artists and instead generate music from public domain sources. I'm sure voice cloning requires permission from the person as well otherwise there would be more legal issues.

The same goes for Copilot training on AGPL code outside of GitHub including StackOverflow, etc. but IANAL.

[0] https://techcrunch.com/2022/10/07/ai-music-generator-dance-d...


👤 fleddr
Ethical: if the input image has a direct and meaningful effect on the derivative output, then I think attribution is the right thing to do. But then again, you'll likely attract negative attention as you never asked for permission.

The interesting question to me is how relevant these input images will be in the future. I've already seen demos where people whom cannot draw for shit paint some sloppy strokes in MS Paint in order to hint AI in the correct direction. This is how a kid's drawing transforms into a Hollywood-class rendered scene. If this is the future direction, we might not need "fancy" input images by human artists, and the question becomes less relevant.

The very concept of "human authorship" is going to be challenged. It's not as simple as prompt->image. People are combining AI with post processing, independently generated layers, it's all going to be a hybrid mess.


👤 pmoriarty
If an AI-generated image can't be mistaken for an already existing, copyrighted image then I really don't see how it can be copyright infringement.

As a human artist, I can study Picasso's paintings my whole life and paint in the style of Picasso, but as long as I don't copy an existing work of Picasso then how can what I do be copyright infringement?

Copyright doesn't protect style, afaik, though IANAL.

Incidentally, human artists get copied all the time. Walk outside any major museum and you'll see endless Van Gogh imitators peddling their wares on the sidewalk. Somehow people don't get all hot and bothered about it. But when an AI does it suddenly everyone's up in arms over copyright violation.

How many human artists have copied Picasso's style? Probably thousands. How many have tried to paint like Da Vinci? Probably millions. Where is the outrage?


👤 brudgers
If it actually matters, hire a lawyer, because legal precedent is established by courts not by online opinions.

If it doesn’t actually matter, it doesn’t actually matter.

The easiest way to make it not matter is avoiding the gray area by creating images by established means with clear legal precedent.

Good luck.


👤 senko
(IANAL) I am increasingly of the belief that:

* the fact that the AI trained on a set of images doesn't mean it's a derivative work, the same way I don't think it's a derivative work if I see a bunch of images online and that influences my painting style

* if an image was used as a direct input (img2img), then the result is a derivative work even if the image is not similar enough

BTW I don't think the same reasoning holds for all AI output, for example if Copilot basically "copy-pastes" an exact snippet of code, that's derivative of the original (unless the snippet is trivial or the only way to achieve the result, ie. non-copyrightable).


👤 JoeyBananas
IANAL but if it can't be proven, it doesn't exist

👤 oxff
Legal and ethical concerns will be subordinate to technological advances, so they are largely things of no worry.

As for your question, just replace the machine learning algorithm with a human in the loop and ask yourself the same question, and you got your answer according to our norms as is.